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Editorial 

Message from Managing Editor 



The International Journal of Computer Science and Information Security (IJCSIS) a monthly, 
peer reviewed, online open access journal that publishes articles which contribute new results 
and theoretical ideas in all areas of Computer Science & Information Security. The editorial board 
is pleased to present the September 2015 issue. The core of the vision is to publish and 
disseminate new knowledge and technology for the benefit of all, ranging from academic 
research and professional communities to industry professionals in a range of topics in computer 
science and information security in general. It also provides a publication place for high-caliber 
researchers, practitioners and PhD students to present ongoing research and development in 
these areas. We are glad to see variety of articles focusing on the major topics of innovation and 
computer science; big data analytics; Biometric security; mobile computing; network security and 
IT applications. This scholarly resource endeavors to provide international audiences with the 
highest quality research manuscripts and accounts of the constant evolution of information 
science and technology in whole. Researchers, academicians, practitioners and doctoral students 
will find this journal as a critical source of reference. 

Over the last years, we have revised and expanded the journal scope to recruit papers from 
emerging areas of green & sustainable computing, cloud computing security, forensics, mobile 
computing and big data analytics. IJCSIS archives all publications in major academic/scientific 
databases and is indexed by the following International agencies and institutions: Google Scholar, 
CiteSeerX, Cornell’s University Library, Ei Compendex, Scopus, DBLP, DOAJ, ProQuest, ArXiv, 
ResearchGate and EBSCO. 

We thank and congratulate the wonderful team of editorial staff members, associate editors, and 
reviewers for their dedicated services to select and publish high quality papers for publication. In 
particular, we would like to thank the authors for submitting their papers to IJCSIS and 
researchers for continued support to IJCSIS by citing papers published in IJCSIS. Without their 
continued and unselfish commitments, IJCSIS would not have achieved its current premier status. 

“We support researchers to succeed by providing high visibility & impact value, prestige and 
excellence in research publication.’’ 

For further questions please do not hesitate to contact us at iicsiseditor&gmail. com . 
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1. Paper 31081501: An Efficient Big Data Analytics Platform for Mobile Devices (pp. 1-5) 

Ngu Wah Win, University of Computer Studies, Yangon, UCSY 
Thandar Thein, University of Computer Studies, Yangon, UCSY 

Abstract — Big data analytics technologies are to extract value from very large data volume, variety of data, and 
highly rate of data stream. With the fast deployment of cloud services with mobile devices, big data analytics is 
shifting from personal computer to mobile devices. But, significant limitations of mobile devices are less storage 
amount and processing power. This paper proposes a big data analytic platform on mobile cloud computing with 
efficient query execution time by developing MapReduce Transformation Process and query operation based on 
input query’s complexity level. Furthermore, this paper presents the process of RESTful web service for providing 
seamless connectivity between mobile devices and cloud storage, where store the data and all of necessary 
processing steps are done by cloud backend. This proposed platform is evaluated by using Census dataset and 
compared the result with other traditional high level query languages, such as Pig, Hive, and Jaql. 

Keywords- big data analytic; mobile cloud computing; Hadoop MapReduce; RESTful web serivce 



2. Paper 31081519: Face Recognition based on Spatial and Transform Domain Techniques (pp. 6-14) 

Jagadeesh H S, Dept, of Electronics and Communication Engineering, APSCE, Bangalore, India 
Suresh Babu K, Dept, of Electronics and Communication Engineering, UVCE, Bangalore, India 
K B Raja, Dept, of Electronics and Communication Engineering, UVCE, Bangalore, India 

Abstract — Real time challenges are eternal for verification and recognition of a person using face biometric. In this 
paper, we propose an efficient algorithm using Extended Directional Binary codes (EDBC) with Discrete Wavelet 
Transform (DWT) and Singular Value Decomposition (SVD) with Discrete Cosine Transform (DCT). In 
preprocessing, the face images are refined to make them suitable for feature extraction. The Approximate (LL) band 
is extracted by applying DWT. One hundred EDBC features are extracted using LL sub band. Additionally, DCT is 
applied on pre-processed image and one hundred SVD features are extracted from DCT output. Final one hundred 
distinct features are derived by fusing each of one hundred EDBC and SVD features. The features comparison is 
performed using Euclidean Distance (ED) measure to produce the results. Performance on JAFFE and ORL face 
datasets shows that the proposed algorithm is superior in different metrics compared to other existing methods. 

Keywords- Discrete wavelet transform; Euclidean distance; Extended directional binary codes; Face biometric; 
Singular value decomposition. 



3. Paper 31081526: Exchange Rates Forecasting Using Variable Length Moving Average - NARX (pp. 15-18) 

Agus Sihabuddin, Subanar, Dedi Rosadi, Edi Winarko 

Computer Science Graduate Program, Faculty of Mathematics and Natural Science, Gadjah Mada University, 
Yogyakarta - Indonesia 

Abstract — This paper evaluates whether the variable-length moving average (VMA) as input in NARX outperform 
the univariate exchange rate forecasting performance. Six major rates of monthly data from January 1975 to April 
2014 (USDAUD, USDCAD, USDEUR, USDGBP, USDJPY and USDCHF) are used to test the proposed model 
with a (1,5,0) VMA rule. We evaluate that the VMA can be used as input for NARX model and the forecasting 
accuracy is outperform the NAR univariate model with 19.97% improvement on Dstat and 3.17% improvement on 
MSE. 



Keywords — forecasting, major exchange rates, VMA, NAR, NARX 



4. Paper 31081527: Implementing the Database Users Privilege Model: Effects of possible successful SQL 
Injection attack in web applications (pp. 19-21) 

Gem Ralph Caracol, Soomi Yang 

Information Security Department, Suwon University, Hwaseong City, South Korea 

Abstract — For more than a decade, many solutions and detection mechanisms have been proposed to prevent web 
applications from SQL Injection attacks. However, according to the 2013 OWASP 10, SQL Injection is still a top 
threat and attackers can find evasive ways to exploit this vulnerability. We implemented the proposed database level 
Database Users Privilege Model of [6]. Our findings suggest that by implementing the model, it does not detect SQL 
Injection attacks but with the necessary recommendations, it can prevent the attacker from fully compromising the 
database and the database server. 

Keywords-SQL Injection, Database Users Privilege Model, OWASP, Database Security, Web Application 



5. Paper 31081528: Security Issues and Solutions for Android-based Mobile Devices (pp. 22-27) 

Klever R. P. Cavalcanti, Edejair Viana 

Department of Statistics and Informatics, Federal Rural University of Pernambuco, Pernambuco, Brazil 
Fernando A. A. Lins, Department of Statistics and Informatics, Federal Rural University of Pernambuco, 
Pernambuco, Brazil 

Abstract — Currently, mobile devices are being widely used by of a considerable number of people. The need to be 
connected 24 hours a day is becoming a reality, because users need to make online purchases, make payments, 
access social networks, surf the Internet, check e-mails and so on. In this context, users with mobile devices, 
especially smartphones, using the Internet to connect to specific applications and safety issues may arise because, 
for example, sensitive data may be sent over an insecure channel (Internet). The aim of this paper is to present an 
overview of current security risks and security solutions related to smartphones based on the Android platform. 
Security risks have been divided into five categories, and these risks are presented and detailed on the corresponding 
categories. In addition, still stand out security solutions that are currently available on Android stores and these 
solutions can be used to eliminate or mitigate the risks. 

Keywords- Android; Mobile; smartphone; security; Application. 



6. Paper 31081534: Intrusion Detection and Prevention Systems (IDPS) State of Art (pp. 28-35) 

Homam Reda El-Taj, Community College Univesity of Tabuk, Tabuk, Saudi Arabia 

Abstract — Over the past few years, we have witnessed an ever-increasing rampant battle between network 
managers & computer criminals which led to many developments in the tools used by both parties i.e. " legitimate 
computer professionals & computer criminals". Recently, interruption exposure & avoidance systems have earned 
increasing appreciation due to their more frequent involvement in the security domain since they allow security 
analysts to make fast reactions and thus take quick decisions to prevent any possible damage. This underlines the 
significance of upgraded detection & prevention processes, taking into consideration the competitive and dynamic 
network environment these days. Most commercial and government information systems are connected through the 
internet, which will expose them to possibilities of spasms. The increasing number of computer users marks it 
precise, ever since computers have become a necessary device in our lives. This article covers the IDPS state of art 
and discusses the IDPS challenges. 

Keywords- component; Network, IDS, IPS, Security, IDS Myths, IDS Alerts. 



7. Paper 31081520: Improving the Quality of Composite Services Through Improvement of Cloud 
Infrastructure Management (pp. 36-44) 

Olga Shpur, Department of Telecommunication, Lviv Polytechnic National University, Lviv, Ukraine 
Mykhailo Klymash, Department of Telecommunication, Lviv Polytechnic National University, Lviv, Ukraine 
Marian Seliuchenko, Department of Telecommunication, Lviv Polytechnic National University, Lviv, Ukraine 
Bogdan Strykhaliuk, Department of Telecommunication, Lviv Polytechnic National University, Lviv, Ukraine 
Orest Lavriv, Department of Telecommunication, Lviv Polytechnic National University, Lviv, Ukraine 

Abstract — For improving the quality of composite services in cloud infrastructure this paper proposes an integrated 
control architecture using the NVF technology that provides load balancing with estimation of available system 
resources. The aim of our proposed algorithms is to assess existing physical and virtual, resources of 
telecommunication system. The analysis is based on the maximum value of integral resources index of each physical 
machine. Maximum values of available virtual and physical resources are transferred to Orchestrator, which 
provides the migration of service components to less loaded servers if necessary. Performance analysis of the 
proposed approach shows that load balancing by implementing integrated management architecture based on NVF 
technology allows to reduce the duration of service requests by 3 times. 

Keywords - cloud infrastructure management; load balancing; NVF; analysis cloud resources; duration of service 
requests 



8. Paper 31081502: Service-Oriented Architecture for Secure Service Discovery and Selection in Specialized 
Mobile Networks (pp. 45-71) 

M. Adel Serhani, College of Information Technology, UAE University, Al-Ain, UAE 

Yasser Gadallah, The American University in Cairo, Egypt 

Ezedin Barka, College of Information Technology, UAE University, Al-Ain, UAE 

Abstract - Special operations such as emergency response as well as military missions are usually characterized by 
the limited resources available to handle generally large-scale operations. Precise resource discovery and allocation 
thus becomes an important factor for the success of such operations. This task has been recognized as a challenging 
research issue. This is due to the dynamic nature of the emergency response elements e.g., personnel and equipment. 
One of the important requirements of these operations is achieving the security of the communications involved in 
the resource discovery and allocation tasks. Security ensures the confidentiality, integrity, and availability of the 
communicated information. Therefore, solutions that are intended for selecting best matching service(s) should not 
only rely on the functional properties of the service but also on level of security under which this service is provided. 
In this study, we propose a secure multicast service discovery architecture that is based on a mobile ad hoc network 
(MANET) of operation participants. The main objective is to locate, select, reserve and assign certain resources to 
parties that are in need of these resources. The involved communications are designed to be secure multicast-based, 
utilizing features of the Role-Based Access Control (RBAC) Model. We describe the details of the proposed 
communication protocol. We then qualitatively compare our architecture to other alternative MANET-based service 
discovery architectures. The comparison highlights the merits of the proposed architecture. Finally, we conduct and 
present the results of a set of experiments to evaluate key features of our proposed architecture. 

Keywords: Service-Oriented Architecture, Secure Service Discovery, MANET, Special Operations, Access Control, 
Service Selection. 



9. Paper 31081511: Vehicle Segmentation Using K-Means with Fuzzy Logic (pp. 72-77) 

(I) Shakila basher, (2) Purushothaman S., and (3) Rajeswari P. 

(1) Department of MCA, VELS University, Chennai, India. 

(3) Department of Electrical and Computer Science Engineering 
(2,3) Institute of Technology, Haramaya University, DireDawa, Ethiopia 



Abstract - This paper presents methods for vehicle segmentation. The camera can be fixed or moving which can 
be used to capture the moving vehicle. During this process, the orientation of the vehicle captured can be in any 
direction. Many segmentation methods available. However, K-Means with Fuzzy logic can be still more appropriate 
in segmenting the vehicles moving on the road. 



10. Paper 31081513: Effective feature selection in multi-label classification problems using genetic algorithms 
(pp. 78-82) 

Somayeh Fattahi Ferdowsi, Department of computer, Zanjan Branch, Islamic Azad University, Zanjan, Iran 

Abstract- Designing a classifier in a multiple-label classification issue in which the number of features used for 
describing each sample is high and the number of samples is low, is faced with many problems. The features 
describing each sample can be divided into three categories: relevant, irrelevant and redundant. Redundant and 
irrelevant features could seriously influence the accuracy of classification in such issues. In spite of works carried 
out, the issue of feature selection in multiple-label classification issues is still considered as a challenge. For this 
reason, the issue of feature selection has been discussed in this paper. The aim of feature selection problem is 
finding a subset of features in order to modify and improve the accuracy of estimation without loss of accuracy that 
classifier using the selected features performs the classification of data. This study was aimed to enhance the 
efficiency of classification through decreasing the error and to increase classification speed through selecting a 
subset of effective features in classification of multi-label data as well as using mutual information, Genetic 
Algorithm and Rank SVM classification The performance of proposed method has been evaluated on three data 
collections including Emotions, Scene and Yeast that are available on Mulan website. 

Keywords: feature selection, multi-label classification, genetic algorithms and classification Rank SVM 



11. Paper 31081515: Vehicle Tracking Using Locally Weighted Projection Regression Method (pp. 83-89) 

(I) Shakila basher, (2) Purushothaman S., and (3) Rajeswari P. 

(1) Department of MCA, VELS University, Chennai, India. 

(3) Department of Electrical and Computer Science Engineering 
(2,3) Institute of Technology, Haramaya University, DireDawa, Ethiopia 

Abstract - This paper presents the method of tracking vehicle in video frames using Locally weighted projection 
regression (LWPR). The coordinates of the segmented vehicle image are presented to the LWPR. Based on the 
coordinates of the previous video frames, the LWPR estimates the next position of the vehicle. The LWPR is trained 
with coordinates of the vehicle obtained from few frames. Based on the learned information the next movement of 
the vehicle is estimated without processing next few video frames. 



12. Paper 31081518: Multilevel Extensible and Dynamic of Mobile Establishment Concepts (pp. 90-91) 

Ammar Es-Said, University Hassan II/ Faculty of Science Ben M'sik, Casablanca Morocco 
Labriji El Houssine, University Hassan II/ Faculty of Science Ben M'sik, Casablanca Morocco 

Abstract — Mobile establishment of masts is an exclusive competence a control power, basically regarding town 
planning, mobility characterizes what could move or be moved, which can change place, this multilevel extensible, 
dynamic notion intuitive the activity however by three different aspects, and as many approaches, ‘nomadisme’, 
ubiquity, the sensitive system in context, nevertheless the use of these devices remains immersive, these devices 
requires all the attention, independently from this one. This approach is often called ‘nomasime 4 , although this term 
can take different significance in other fields to find proximity, In ray of influence that remains to be determined, 
mobility is in fact related to features of the increasing data of the computing mobile. 

Keywords — mobility, extensible, mobile devices, WPAN, mobile failures, wired, Mobile Technology 



13. Paper 31081530: Forensic Investigation of User’s Web Activity on Google Chrome using Open-source 
Forensic Tools (pp. 92-100) 

Narmeen Shafqat, Baber Aslam 

Dept of Information Security, MCS, National University of Science and Technology, Rawalpindi, Pakistan 

Abstract — Cyber Crimes are increasing day by day, ranging from confidentiality violation to identity theft and 
much more. The web activity of the suspect, whether carried out on computer or smart device, is hence of particular 
interest to the forensics investigator. Browser forensics i.e forensics of suspect’s browser history, saved passwords, 
cache, recent tabs opened etc. , therefore supply ample amount of information to the forensic experts in case of any 
illegal involvement of the culprit in any activity done on web browsers. Owing to the growing popularity and 
widespread use of the Google Chrome web browser, this paper will forensically analyse the said browser in 
windows 8 environment, using various forensics tools and techniques, with the aim to reconstruct the web browsing 
activities of the suspect. The working of Google Chrome in regular mode, private “Google Incognito Mode” and 
portable modes of operation is discussed at length in this paper. 

Keywords — Browser forensics, Private web browsing, Chrome Incognito, Chrome forensics, Portable browser 
forensics, Chrome artifacts. 



14. Paper 31081532: Natural Language Processing and Machine Learning: A Review (pp. 101-106) 

Fateme Behzadi, Computer Engineering Department, Bahmanyar Institute of Higher Education, Kerman, IRAN 

Abstract — Natural language processing emerges as one of the hottest topic in field of Speech and language 
technology. Also Machine learning can comprehend how to perform important NLP tasks. This is often achievable 
and cost-effective where manual programming is not. This paper strives to Study NLP and ML and gives insights 
into the essential characteristics of both. It summarizes common NLP tasks in this comprehensive field, then 
provides a brief description of common machine learning approaches that are being used for different NLP tasks. 
Also this paper presents a review on various approaches to NLP and some related topics to NLP and ML. 

Keywords- Natural Language Processing, Machine learning, NLP, Ml. 



15. Paper 31081533: A User-Aware Approach to Provide Context Aware Web Service Composition (pp. 107- 

120 ) 

Sihem Cherif MIRACL, ISIMS, Cite El Ons, Route de Tunis Km 10, Sakiet Ezziet 3021, Sfax, Tunisia 
Raoudha Ben Djemaa, MIRACL, ISIMS, Cite El Ons, Route de Tunis Km 10, Sakiet Ezziet 3021, Sfax, Tunisia 
Ikram Amous, MIRACL, ISIMS, Cite El Ons, Route de Tunis Km 10, Sakiet Ezziet 3021, Sfax, Tunisia 

Abstract — Web services Compositions are rapidly gaining acceptance as a fundamental technology in the web 
fields. They are becoming the cutting edge of communication between the different applications all over the web. 
With the need for the ubiquitous computing and the pervasive use of mobile devices, the context aware web service 
composition becomes a hot topic. This later aims to adapt the web service composition behavior according to the 
user’s context such as his specific work environment, language, type of Internet connection, devices and 
preferences. Many solutions have been proposed in this area. Nevertheless, the adaptation was carried out only at the 
runtime and it partially covered the user’s general context. In this paper, we introduce a new context-aware approach 
that provides dynamic adaptation of service compositions. Our approach allows to express requirements by taking 
into account potential user’ s context in addition to the functional one. 

Keywords- component; UDDI, AAWS-WSDL, Dynamic context, SAB PEL, CAC-WSR 



16. Paper 31071533: Analysis and Detection of the Zeus Botnet Crime ware (pp. 121-135) 



Laheeb Mohammed Ibrahim, Software Engineering, Mosul University, Collage of Computer Sc. & Math., Mosul, 
Iraq 

Karam H. Thanon, Software Engineering, Mosul University, Collage of Computer Sc. & Math., Mosul, Iraq 

Abstract — The raised significant evolution of the Internet next to the development of the high prevalence of 
computers, smart phones and the Internet on a large scale in most of the trends of life, but this use leads to network 
attacks, with a large use of e-commerce, they needs websites on the Internet. E-commerce represents a good reason 
for criminals or attackers to be diverted to profit law. Recently, the attackers used botnets to achieve their goals. A 
comprehensive study of botnet is done in this paper , study a life cycle of botnet, the attack on the behavior, 
topologies and technologies of botnet, studied of Zeus robots (An ethical penetration operation has been done using 
Zeus botnet version 1.2.7.19 and using Zeus version 2. 0.8. 9) were is done in in detail to determine its characteristics, 
and be able to detect it in the computers on the Internet. Host Botnet Detection Software (HBD's) is designed and 
implemented to detect Zeus botnet in user’s computers. The HDB's depends on information obtained from studying 
Zeus in addition to information obtained from analysis (an analysis of Zeus bot has been done by using reverse 
engineering tool (Ollydbg reverse engineering tool)) and penetration operation. In order to remove Zeus botnet from 
victim computers. 

Keywords - Zeus, Host Botnet Detection Software (HBDS), Botnet, Ollydbg reverse engineering tool, Zeus botnet 
version 1.2.7.19 and using Zeus version 2. 0.8. 9 



17. Paper 31081525: Security and Cryptography on World Wide Web (pp. 136-141) 

Okal Christopher Otieno, Department of Information Technology, Mount Kenya University, Nairobi, Kenya 
Magati Steve Biko, Department of Information Technology, Mount Kenya University, Nairobi Kenya 

Abstract - Security has been a major concern for internet users regarding the potential for harm that a breach in their 
systems can present to the landscape. The world is evolving towards an internet dependent computing architecture 
considering the advancements in cloud computing and other such technologies. They promise various 
benefits mostly relating to the cost of operation for the businesses and personal Internet users. This investigation has 
the intention of determining the role that cryptography plays in ensuring the safety of the systems and the 
information that people share over the internet. The paper looks at the types of cryptography that are in broad use in 
the field of information technology and how they enable the protection that the users need. There are important 
protocols of protection including Internet Protocol Security Standard (IPSec) and Field Programmable Gate Arrays 
(FPGAs) that use cryptographic techniques. The investigation concludes that cryptography is an important 
contributor to Internet security through the implementation of such procedures. 



18. Paper 31081521: LDA-PAFI: Linear Discriminate Analysis Based Personal Authentication using Finger 
Vein and Face Images (pp. 145-152) 

Manjunathswamy B E, Dr Thriveni J, Dr Venugopal K R 

Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore, 
India 

Abstract — Biometric based identifications are widely used for individuals personnel identification in recognition 
system. The unimodal recognition systems currently suffer from noisy data, spoofing attacks, biometric sensor data 
quality and many more. Robust personnel recognition can be achieved considering multimodal biometric traits. In 
this paper the LDA(Linear Discriminate analysis) based Personnel Authentication using Finger vein and Face 
Images (LDA-PAFF) is introduced considering the Finger Vein and Face biometric traits. The Magnitude and Phase 
features obtained from Gabor Kernels is considered to define the biometric traits of personnel. The biometric feature 
space is reduced using Fischer Score and Linear Discriminate Analysis. Personnel recognition is achieved using the 
weighted K-nearest neighbor classifier. The experimental study presented in the paper considers the (Group of 
Machine Learning and Applications, Shandong University-Homologous Multimodal Traits) SDUMLA-HMT 



multimodal biometric dataset. The performance of the LDA-PAFF is compared with the existing recognition 
systems and the performance improvement is proved through the results obtained. 

Keywords-SDUMLA_HMT; LDA-PAFF ; Phase; Magnitude; fisher Score 



19. Paper 31081516: The Socio-Political Influences of the Globalization of the IT Industry (pp. 153-157) 

Okal Christopher Otieno 

Department of Information Technology, Mount Kenya University, Nairobi, Kenya 

Abstract - This paper seeks to establish the sociopolitical impacts that the globalization of the information 
technology industry has in different societies. The review focuses on defining the two concepts of IT and 
Globalization and later outlines the relationship that exists between them. IT is an important contributor to the 
process of globalizing different communities and economies. While most people might view the relationship as 
unidirectional, it is untrue since globalization also impacts how the IT industry develops through various markets. 
The paper focuses on the two objectives of IT that are automation of processes for better efficiency and integrating 
people by promoting communication between them. These goals match the definition of globalization that describes 
their connection. Globalizing IT has different impacts on the politics and policy implementations of various 
locations through the flow of information that influences the people in governments. Other effects include the 
formation of trade blocs that are influential on the performance of the IT industry. 
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Abstract — Big data analytics technologies are to extract value 
from very large data volume, variety of data, and highly rate of 
data stream. With the fast deployment of cloud services with 
mobile devices, big data analytics is shifting from personal 
computer to mobile devices. But, significant limitations of mobile 
devices are less storage amount and processing power. This paper 
proposes a big data analytic platform on mobile cloud computing 
with efficient query execution time by developing MapReduce 
Transformation Process and query operation based on input 
query’s complexity level. Furthermore, this paper presents the 
process of RESTful web service for providing seamless 
connectivity between mobile devices and cloud storage, where 
store the data and all of necessary processing steps are done by 
cloud backend. This proposed platform is evaluated by using 
Census dataset and compared the result with other traditional high 
level query languages, such as Pig, Hive, and Jaql. 

Keywords-big data analytic; mobile cloud computing; 
Hadoop MapReduce; RESTful web serivce 

I. Introduction 

The Internet generates the largest amount of data and it 
has exceeded the zetabyte levels. Processing the high 
volume of data is beyond the computational capabilities of 
traditional data warehouses, giving rise the term Big Data 
[3, 4]. Cloud computing is the powerful platform because of 
their well-known services. It can give many advantages to 
users by allowing them to use infrastructure, platforms and 
software by cloud providers at low cost and elastically in an 
on demand fashion [1]. After the number of mobile phone 
usage is many times the number of personal computers, 
these small portable devices that can access information are 
already part of everyday life for hundreds of millions of 
people in the developed world. Mobile devices need to 
borrow storage and computing power from the cloud 
because of their limited resources. When mobile devices try 
to access a shared pool of computing resources, cloud 
computing becomes mobile and accessing data in the cloud 
from mobile devices is becoming a basic need. 

Because of this stream of technology requirements, 
many researches emphasize to integrate mobile device and 
big data analysis to gain the business facilities by using 
mobile web services. Mobile web services allow deploying, 
discovering and executing of web services in a mobile 
communication environment using standard protocol. Web 
service can be classified into two main categories: RESTful 
and SOAP-based web services. On the other hand, Hadoop 
is becoming the core technology in big data analytic to solve 
the business problem for large organizations with 
MapReduce programming model. The server level 
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architecture for Big Data consists of parallel computing 
platforms that can handle the associated volume and speed. 
Clusters or grids are types of parallel and distributed 
systems, where a cluster consists of a collection of 
interconnected stand-alone computers working together as a 
single integrated computing resource, and a grid enables the 
sharing, selection, and aggregation of geographically 
distributed autonomous resources dynamically at runtime. A 
commonly used architecture for Hadoop consists of client 
machines and clusters of loosely coupled commodity servers 
that serve as the HDFS distributed data storage and 
MapReduce distributed data processing. 

The MapReduce is the programming model for data 
processing [9, 10, 11, 12, 13]. It operates via regular 
computer that uses built-in hard disk, not a special storage. 
Each computer has extremely weak correlation where 
expansion can be hundreds and thousands of computers. 
Since many computers are participating in processing, 
system errors and hardware errors are assumed as general 
circumstances, rather than exceptional. With a simplified 
and abstracted basic operation of Map and Reduce, many 
complicated problems can solve. Programmers who are not 
familiar with parallel programs can easily perform parallel 
processing for data. It supports high throughput by using 
many computers. As the core technology of the Hadoop is 
the MapReduce parallel processing model, all of the high 
level query languages that run on Hadoop are the 
MapReduce based query languages such as Hive, Pig, and 
JAQL. This paper presents a big data analytic platform for 
mobile device with different OS and concludes with 
experimental results based on query execution time. 

II. Related work 

There are many types of existing big data analytic 
platforms for large scale data. Most of them based on 
MapReduce, distributed file system, and no-SQL indexing. 
Tableau [2] is known for its strong visualization features, 
which can support exploratory or discovery analytics. 
Analytics aside, Tableau is also used as an all-purpose BI 
platform, applied to either enterprise or departmental needs. 
The visual approach seen in Tableau enables high ease of 
use so that - with simple drag-and-drop methods - an 
analyst or other user can interact directly with the 
visualization and other visual controls to form queries, 
reports, and analyses. If the user knows the basics of 
enterprise data, he or she doesn’t need to wait for assistance 
from IT. With a few mouse-clicks, a user can access a 
database, identify data structures of interest, and bring big 
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data into server memory for reporting or analysis - all in a 
self-service manner. 

The Vertica Analytics platform has a high-speed, 
relational SQL DBMS purpose-built for analytics and 
business intelligence. Vertica has helped over 300 customers 
monetize their data in unique ways, including Zynga, JP 
Morgan, Verizon, Comcast, Vonage, Blue Cross Blue 
Shield, and others. The Vertica Analytics platform offers a 
shared-nothing, MPP column- oriented architecture, and has 
been benchmarked by many customers as being on average 
lOx to 200x faster than other solutions. It also uses 
compression very aggressively, both of data on disk and on 
data “in motion” during queries, which further enhances 
query speed while enabling cost-effective storage 
management. The Vertica Analytics Platform runs on 
clusters of inexpensive, industry- standard Linux servers and 
requires limited resources up front for setup and performance 
configuration. Unlike most solutions in this space, Vertica 
was purposely built from the ground up for today’s most 
demanding analytics challenges. 

Teradata Database is famous for supporting large and 
mostly centralized Enterprise Data Warehouse (EDWs) that 
yield scalability and fast performance, despite the fact that 
they’re supporting concurrent mixed workloads, such as 
those for standard reports, performance management, OLAP, 
advanced analytics, and real-time or streaming data. 
Furthermore, Teradata ’s support for third normal form and 
in-database analytic processing makes it a good platform for 
managing and analyzing detailed big data. The centralized 
EDW has distinct advantages. Yet, some Teradata customers 
need analytic databases outside the main Teradata System. In 
response, Teradata introduced a line of data warehouse 
appliances and acquired Aster Data. Since then, Aster Data 
has received a patent on its native SQL integration with 
MapReduce called SQL-MapReduce (with Hadoop lacks). 
And Teradata continues to improve support for partnering 
analytic tools and platforms. 

Our big data analytic platform for mobile devices 
provides a solution to reduce the query processing time 
based on complexity of query that requested by mobile users 
with MapReduce Transformation Process and query mode 
operation. To achieve the seamless connectivity between 
mobile and cloud storage, we used RESTful web service 
technology. By using this platform, users send a request from 
their mobile device and get back the results without 
noticeable amount of time. 

III. MOBILE CLOUD COMPUTING AND BIG DATA 
ANALYTICS CONCEPT 

A. Mobile Cloud Computing 

Mobile cloud computing (MCC) at its simplest, refers to 
an infrastructure where both the data storage and data 
processing happen outside of the mobile device. Mobile 
cloud applications move the computing power and data 
storage away from the mobile devices and into powerful and 
centralized computing platforms located in clouds, which are 
then accessed over the wireless connection based on a thin 
native client. Improving data storage capacity and processing 



power: it enables mobile users to store/access large data on 
the cloud and helps to reduce the running cost for 
computation intensive applications [7]. 

B. Big Data Analytic 

Big data analytics requires massive performance and 
scalability - common problems that old platforms can’t scale 
to big data volumes, load data too slowly, respond to queries 
too slowly, lack processing capacity for analytics and can’t 
handle concurrent mixed workloads [5, 6]. 

There are two main techniques for analyzing big data: 
the store and analyze, and analyze and store. The store and 
analyze integrates source data into a consolidated data store 
before it is analyzed. The advantages of this are improved 
by data integration and data quality management, plus the 
ability to maintain historical information. The disadvantages 
are additional data storage requirements and the latency 
introduced by the data integration task. 

Analyze and store technique analyzes data as it flows 
through business processes, across networks, and between 
systems. The analytical results can then be published to 
interactive dashboards and published into data store for user 
access, historical reporting and additional analysis. This can 
also be used to filter and aggregate big data before it is 
brought into a data warehouse. 

C. Hadoop Distributed File System and MapReduce 

The Hadoop distributed file system (HDFS) [19] is 
designed to store very large data sets reliably, and to stream 
those data sets at high bandwidth to user applications. 
HDFS stores file system metadata and application data 
separately. As in other distributed file system, HDFS stores 
metadata on a dedicated server. All servers are fully 
connected and communicate with each other using 
transmission control protocol (TCP) based protocols. The 
following figure shows the Hadoop distributed file system 
architecture. 

Hadoop MapReduce is a software framework for easily 
writing applications which process vast amounts of data in 
parallel on large clusters of commodity hardware in a 
reliable, fault-tolerant manner. The framework sorts the 
outputs of the map, which are then input to the reduce tasks. 
Typically, both the input and the output of the jobs are stored 
in a file system. The framework takes care of scheduling 
tasks, monitoring them and re-executing the failed tasks. 

D. RESTful Web Service 

REST stands for Representational State Transfer: it is a 
resource oriented technology and it is defined by Fielding in 
[15] as an architectural style that consists of a set of design 
criteria that define the proper way for using web standards 
such as HTTP and URIs. Although REST is originally 
defined in the context of the web, it is becoming a common 
implementation technology for developing web services. 
RESTful web services are implemented with web standards 
(HTTP, XML and URI) and REST principles. REST 
principles include addressability, uniformity, connectivity 
and stateless. RESTful web services are based on uniform 
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interface used to define specific operations that are operated 
on URL resources. 

E. MapReduce based High Level Query Languages 

A number of HLQLs have been constructed on top of 
Hadoop to provide more abstract query facilities than using 
the low-level Hadoop Java based API directly. Pig, Hive, 
and JAQL are all important HLQLs. Programs written in 
these languages are compiled into a sequence of 
MapReduce jobs; to be executed in the Hadoop MapReduce 
environment. Apache Hive [14, 15, 16] is an open-source 
data warehousing solution built on top of Hadoop. Hive 
provides an SQL dialect, called Hive Query Language 
(HiveQL) for querying data stored in a Hadoop cluster. 
Apache Pig [17, 18] provides an engine for executing data 
flows in parallel on Hadoop. It includes a language, 
PigLatin, for expressing these data flows. PigLatin includes 
operators for many of the traditional data operations, as well 
as the ability for users to develop their own functions for 
reading, processing, and writing data. Jaql [8] is a 
declarative scripting language for analyzing large 
semistructured datasets in parallel using Hadoop’ s 
MapReduce framework. It consists of a scripting language 
and compiler, as well as a runtime component for Hadoop. 
It is extremely flexible and can support many semistructured 
data sources such as JSON [9], XML, CSV, flat files and 
more. 



IV. Proposed big data analytic platform 

Big data analytics involves analyzing large amounts of 
data of a variety of types to uncover hidden patterns, 
unknown correlations and other useful information. In this 
paper, we proposed a big data analytics platform for mobile 
device and improve the query execution time of user request. 
In our proposed platform consists of four layers: storage 
layer, processing layer, web service layer (for data 
transmission) and application layer. 

• Storage layer, storage layer is to store the data in 
DataNode of Hadoop distributed file system. When 
a file is placed in HDFS it is broken down into 
blocks, 64MB block size by default. The default 
replication is 3, i.e. there will be 3 copies of the 
same block in the cluster. Hadoop follows the 
master-slave architecture. The slave machines run 
dataNode to store data with distributed architecture 
that supported by Hadoop. 

• Processing layer, processing layer is to work 
together with storage layer. The main components 
of this layer are TaskTracker and JobTracker. After 
the JobTracker receive a request from client, it 
assigns TaskTracker which task to be performed. 
Normally, JobTracker is run on master machine 
and it tries to connect with salve machine, to 
execute the data, where DataNodes are running. 
TaskTracker is a daemon that accepts tasks (Map 
and Reduce) from the JobTracker and sends 
progress/status information of Map and Reduce 
tasks to the JobTracker. 
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Figure 1 . Sysem flow of the proposed big data analytic platform 

• Web service layer, web service layer is responsible 
for providing seamless connection between client 
mobile device and cloud storage. At the same time, 
it reduces the complexity of result from cloud 
storage to become a light-weight data transfer. 

• Application layer, application layer which operates 
the user request by using Pig, Hive and Jaql. This 
proposed platform combined MapReduce 
Transformation Process to perform user requested 
simple queries. 

A. System Llow of Proposed Platform 

Firstly, mobile user sends request from mobile device 
and this request pass to the cloud storage by using RESTful 
web service. After web service receiving the request, it 
determines whether the request is simple query or not by 
matching local simple query list. If the incoming request is 
simple query type, it will work through red arrow line and if 
it was a complex query, it will work through black arrow 
line. 

For the simple query request, we developed a 
MapReduce Transformation Process which perform ad-hoc 
simple query by using Map and Reduce function. In this 
stage, it breaks the query into sub-queries according to the 
query decomposition rules. These sub-queries work with 
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multiple Mapper classes and Reducer class which produces 
the final output result. All of these Mappers and Reducer are 
Map and Reduce tasks of the TaskTracker nodes. 

For the complex query request, the platform will work 
with traditional query processing language, HiveQL, is more 
efficient than other high level query languages, Pig and Jaql. 
After testing above query languages, we can conclude that 
HiveQL is three times faster than other two languages. 
HiveQL extract data from DataNodes with hive server and 
hadoop-hive driver. In these two mechanisms, the query 
processing time of the proposed platform is faster when it 
used MapReduce Transformation Process when they run 
same queries. Because, HiveQL is need to transform query 
into MapReduce form to combine with Hadoop Distributed 
File System. But the proposed MapReduce Transformation 
Process is not need to transform MapReduce form and 
processing time of sub-queries is effectively reduce the 
overall query processing time. The output result of the 
Reducers from both query engines is the text file format and 
this text file is transformed into JSON format to be a light 
weight message for mobile user. The RESTful web service 
returns the JSON output to the mobile user and the mobile 
devices need to develop a convenient data visualization 
application that can change the received JSON output to 
become a user friendly graphical representation form. By 
developing this platform, mobile user can applied big data 
analytic process on cloud infrastructure with efficient query 
processing time. 

B. Experiment Environment 

We implement the mobile platform for big data analytic 
and evaluate on different Operating Systems and different 
high level query languages. To build a storage cluster, we 
created 16 VMs for NameNode, Secondary NameNode, 
DataNode, JobTracker and TaskTracker. 



TABLE I. EXPERIMENT PARAMETERS 



Parameters 


Specification 


OS 


- Ubuntu 14.04 Linux, 

- Red Hat Enterprise Linux 6.4 


Host 

Specification 


Intel ® Core i7-2600 CPU @ 3.40GHz, 
Intel ® Core i7-3770 CPU @ 3.40GHz, 
8GB Memory, 1TB Hard Disk 


VMs 

Specification 


1GB RAM, 50 GB Hard Disk 


Mobile 

Device 

Specification 


Huawei G730-U00, Android OS version 4.2.2 (Jelly 
Bean), Quad-core 1.3 GHz Cortex- A7, 4GB internal 
memory 


Software 

Component 


- Hadoop 1.1.2 

- Hive 0.14.0, Pig 0.12.1, Jaql 0.5.1 


Data Set 


US census dataset [20], 
-114 GB in size 



The specification of devices and necessary software 
components used in mobile cloud infrastrue, and dataset ued 
in MapReduce processing are described in table 1 . 

C. Result Discussion 

In this platform, we test many queries and record the 
query processing time of traditional query languages and 
proposed MapRedcue Transformation Process on both 
operating systems, Red Hat and Ubuntu. The following 

figure shows the one of the tested query. 

The HiveQL (Hive Query Language) is 

hive> create table population (ID int, FILEID string, 
STUSAB string, CHARITER string, CIFSN string, 
LOGRECNO string, POPCOUNT int) row format 
delimited fields terminated by V stored as textfile; 

hive> load data inpath Vuser/root/Rec250000.csv' 
overwrite into table population; 

hive> select STUSAB, sum(POPCOUNT) from 
population group by STUSAB; 

The PigLatin is 

grunt> population = load Vuser/root/families.csv' using 
PigStorage(V) as (ID: int, FILEID: 

chararray, STUSAB: chararray, CHARITER: chararray, 
CIFSN: chararray, LOGRECNO: chararray, 

POPCOUNT: int); 

grunt>grouped = group population by STUSAB; 

grunt> result = foreach grouped generate group, 
SUM(population.POPCOUNT) ; 

grunt> dump result; 

The Jaql is 

jaql>$ population = read(del("/user/root/families.csv”, { 
schema: schema { ID: long, FILEID: string, STUSAB: 
string, CHARITER: string, CIFSN: string, LOGRECNO: 
string, POPCOUNT: long} })); 

jaql> $population -> group by $STUSAB={$. STUSAB} 
into {$STUSAB, total :sum($[*] .POPCOUNT)}; 



Figure 2. Sample query of Pig, Hive, Jaql and MapReduce 

transformation process 

Figure 3 shows the query processing time of different 
query languages on different operating system by varying 
workloads. 

According to this figure, other query languages take a 
large amount of time to execute a query. From a querying 
point of view, we can conclude that Hive query language 
and MapReduce Transformation Process of proposed 
platform is better than other query language on both OS and 
MapReduce Transformation Process is faster than Hive 
query language on Red Hat and Ubuntu also. From the 
operating system point of view, we can also conclude that 
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the Red Hat OS is more convenient than Ubuntu OS for this 
proposed platform. 
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Figure 3. Comparison of query processing time on different 

OS 



V. Conclusion 

In this paper, we implement a big data analytic platform 
for mobile deivees. This platform operates with RESTful 
web service technology to provide seamless connectivity 
between mobile device and cloud storage. To improve the 
query performance, we developed a MapReduce 
Transformation Process to transform users’ requests into 
MapReduce form. To control the complex query request, we 
implement a traditional querying method, HiveQL, in this 
platform. The analytical result is transferred to the mobile by 
using RESTful web service technology. As a result, 
performance evaluations are conducted to prove that the 
proposed platform provides three times faster than other high 
level query languages in both operating systems. 
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Abstract — Real time challenges are eternal for verification and 
recognition of a person using face biometric. In this paper, we 
propose an efficient algorithm using Extended Directional Binary 
codes (EDBC) with Discrete Wavelet Transform (DWT) and Singular 
Value Decomposition (SVD) with Discrete Cosine Transform (DCT). 
In preprocessing, the face images are refined to make them suitable 
for feature extraction. The Approximate (LL) band is extracted by 
applying DWT. One hundred EDBC features are extracted using LL 
sub band. Additionally, DCT is applied on pre-processed image and 
one hundred SVD features are extracted from DCT out put. Final one 
hundred distinct features are derived by fusing each of one hundred 
EDBC and SVD features. The features comparison is performed 
using Euclidean Distance (ED) measure to produce the results. 
Performance on JAFFE and ORL face datasets shows that the 
proposed algorithm is superior in different metrics compared to other 
existing methods . 

Keywords- Discrete wavelet transform; Euclidean distance; 
Extended directional binary codes; Face biometric; Singular value 
decomposition. 

I. INTRODUCTION 

National security issues have to be paid highest priority 
than others in the present scenario. Threats through territory 
interfaces such as land, naval and airports cannot be neglected. 
Human beings has major role in creating these threats, hence 
person identification is the critical stage in any technological 
implementation. Applications using biometrics have deployed 
enough to recognize humans. Biometrics are either physical 
structure of organs or any behavioral trait associated with 
human beings. Hand geometry, fingerprint, face, iris are some 
physiological biometric parameters normally extracted directly 
[1] and the activities such as typing style, voice, gait are few 
behavioral parameters. Each biometric parameter has its own 
advantage and disadvantage in different applications. Face 
biometric has distinct advantage of non intrusive and captured 
without the cooperation of humans also. Facial expression 
recognition [2] is an augmented part any approach. 

A general Face Recognition (FR) system accepts face 
images as input and processed in different steps to give 
decisions e.g. known/ unknown person, any details of the same 
person. Major steps of FR are preprocessing, feature extraction 
and matching. The performance of a FR system is influenced 
by resolution of input images, methods and tools used at 
different steps. If the input contains low resolution [3] images 
or having variation in illumination, pose [4] and expression 



[5], they are improved in its quality through preprocessing. 
Principal Component Analysis (PCA) is used to handle the 
data corrupted by error [6] and it reduces the dimension [7] of 
input data also. Pose is another vital factor which affects on 
the appearance of image and in-turn on the performance of 
recognition algorithm [8]. Many tools such as Discrete Cosine 
Transform (DCT) [9], Discrete Wavelet Transform (DWT), 
Focal Binary Patterns (EBP) [10], Singular Value 
Decomposition (SVD) and many others are used to derive 
unique features. Directional Binary Codes (DBC) [11] on 
Hong Kong Polytechnic University Near Infra Red (Poly-U 
NIR) face database have promising performance. Then DBC 
with DWT combination used in [12], have shown an 
improvement in recognition accuracy. The accuracy varies for 
expression and poses variation with different databases. The 
Extended DBC (EDBC) proposed in our work extracts precise 
texture features in all directions contributes to improve 
efficiency. The energy efficient DCT with SVD join hands 
towards distinct features extraction and to boost performance. 
Access control [13], surveillance, law enforcement, driving 
licenses, ATMs for bank [14], tele-medicine, information 
security, and forensics [15] applications uses biometrics. 

Contribution: DE-DS FR algorithm is proposed to 
recognise the face of a person, which uses EDBC and SVD to 
extract features. Face cropping and resizing are performed as 
pre-processing. Euclidean distance measure is used for 
matching. 

Organization: Rest of the paper is organized as follows; 
survey of related work is in section II; DE-DS FR model and 
algorithm is in section III and IV respectively. Performance is 
discussed in section V and inferred in section VI. 

II. RELATED WORK REVIEW 

Rasber D Rashid et al., [16] proposed a feature extraction 
algorithm for face recognition problem using Local Binary 
Patterns (LBP) and wavelets. LBP histograms are derived 
from the approximation sub band of wavelets. The face image 
is represented by single histogram. Results using Euclidean 
distance matching on Yale and ORL databases are better 
compared to other LBP based methods. Radhey Shyam and 
Yogendra Narain Singh, [17] presented augmented local 
binary patterns for face recognition in uncontrolled 
environments. The proposed method works on uniform and 
non-uniform patterns locality, then computes Bray Curtis 
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dissimilarity metric. Performance is substantially improved on 
Yale A and Extended Yale B database. Yang Zhao et al., [18] 
proposed completed robust local binary pattern and weighted 
local gray level for texture classification. The proposed 
scheme replaces central pixel by average gray level of 3*3 
dimensions. It is immune to noise and illumination variants. 
Experimental results on texture databases such as Outex 
database, UIUC database, CUReT database, and XU_HR 
database achieves better classification accuracy. 

Jianfeng Ren et al., [19] proposed a Transformation for 
converting LBP feature appropriate to Gaussian distribution. 
Asymmetric principal component analysis is used to remove 
the unreliable dimensions. Euclidean distance with nearest 
neighbor classifier is used as the baseline algorithm. The 
proposed algorithm is evaluated for face recognition, dynamic 
texture recognition, and protein cellular classification. The 
performance is consistent on 8 different datasets. Di Huang et 
al., [20] presented a 3D face representation scheme using 
multi-scale extended local binary patterns with SIFT feature 
matching. The discriminative power of proposed approach for 
recognition and verification of face is proved better on FRGC 
v2.0 database. Khoa Luu et al., [21] introduced an age- 
determination technique using hybrid facial features with 
Kernel Spectral Regression (KSR). The input image is 
preprocessed using logarithmic non sub sampled contourlet 
transform. Uniform local ternary patterns are the local features 
and holistic features are extracted by active appearance model. 
The combination of local and holistic features yields hybrid 
facial features. Feature set's inter class distances are 
minimized and intra class distances are maximized using KSR. 
The efficacy of proposed approach yields promising results on 
FGNET, PAL and Vietnamese Longitudinal Face database. 



each pixel iteratively. Results are better on AR, Extended Yale 
B and Multi-PIE databases compared with other state of the art 
methods. Zizhu Fan et al., [26] proposed Weighted SRC 
(WSRC) method for representation and classification problem. 
It computes the weight for a training sample according to the 
distance or similarity relationship. WSRC concentrates those 
training samples which are similar to the test sample. The 
experiments on AR, Georgia Tech (GT), CMU PIE, and 
Labeled Faces in the Wild face data sets show that the 
proposed algorithm can achieve desirable recognition 
performance. 

Yunlian Sun and Massimo Tistarelli, [27] presented Coarse 
to fine sparse representation technique for the recognition of 
faces. The coarse phase determines an individual dictionary 
for each test sample to produce the least residual with nearest 
neighbours. The experimental results on Extended Yale B, AR 
and ORL face databases shows competitive performance in 
comparison with other state-of-the-art techniques. JunYing 
Gan et al., [28] designed a real-time face recognition system 
based on IP camera and SRC algorithm using OpenCV and 
C++ programming. The algorithm has reconfigured to process 
the video frame also. AdaBoost algorithm is used to detect 
face in each frame, and then LBP is used to extract the textual 
features. The results show that the system can deal with real- 
time video and is robust to illumination. Xingjie Wei et al., 
[29] proposed Dynamic Image to Class Warping (DICW) 
algorithm for the recognition of occluded faces. Dynamic 
programming technique is used to compute the image to class 
distance for classification. Experiments on the FRGC, AR, 
The face we make and LFW face databases show that DICW 
achieves promising performance on various types of 
occlusions. 



Fujin Zhong et al., [22] proposed discriminant locality 
preserving projection method for face recognition problem. It 
uses L-l norm for preserving dispersion between classes and 
within a class. The proposed method is feasible and robust to 
outliers by overcoming the small sample size problem. The 
experimental results on artificial, Binary Alpha digits, FERET 
face and Poly U palm print dataset have demonstrated the 
effectiveness of the proposed method. Bing-Kun Bao et al., 
[23] proposed Corruptions Tolerant Discriminant Analysis 
(CTDA) algorithm to capture the features with high amount of 
similarity within class. CTDA can handle well the gross 
corruptions existing in the training data. Experiments on CMU 
PIE, FERET, and AR face data sets and Pittsburgh food image 
dataset object recognition data set show that CTDA 
outperforms compared to other related algorithms. 

Sumit Shekhar et al., [24] proposed a sparse representation 
method for multimodal biometrics recognition. Alternative 
direction method is used to solve the optimization problem 
and data nonlinearity is handled by kernelizing the algorithm. 
Recognition accuracy is improved on AR face database and 
WVU multimodal data set. Meng Yang et al., [25] presented a 
regularized coding model with iterative feature for robust face 
recognition. The maximum posterior estimation solves coding 
problem and weights are assigned based on coding residuals of 



Jian Zhang et al., [30] presented Nearest Orthogonal 
Matrix Representation (NOMR) for face recognition. The 
specific individual subspace of each image is estimated and 
represented uniquely by singular value decomposition. The 
proposed NOMR is more robust for alleviating the effect of 
illumination and powerful in handling the small sample size 
problem. Performance on Extended Yale B, CMU-PIE, 
FRGCv2, AR and CUHK Face Sketch databases demonstrated 
encouraging performance compared with the state-of-the-art 
methods. Dapeng Tao et al., [31] proposed human behaviour 
recognition scheme for wireless sensor networks using 
hamming sensing, Rank Preserving Discriminant Analysis 
(RPDA) and a nearest neighbour classifier. RPDA encodes 
local rank information of within class samples and 
discriminative information of the between class using patch 
alignment framework. Performance on the SCUT Naturalistic 
3D acceleration based activity dataset demonstrate the 
effectiveness of RPDA for human behaviour recognition. Jing 
Wang et al., [32] proposed Adaptive Sparse Representation- 
Based Classification (ASRC) framework, in which sparsity 
and correlation are jointly considered. For the samples with 
low correlation, ASRC selects the most discriminative samples 
for representation, like SRC; when the training samples are 
highly correlated, ASRC selects most of the correlated and 
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discriminative samples for representation, rather than choosing 
some related samples randomly. Experiments on UCI 
repository data sets verify the robustness of the proposed 
algorithm by comparing it with the state -of-the art methods. 

III. PROPOSED WORK 

The definitions for performance analysis and the block 
diagram of the proposed DWT - EDBC and DCT- SVD based 
Face Recognition with (DE - DS FR) model are discussed in 
this section. 

A. Definitions 

• False Acceptance Rate (FAR): It is the ratio between the 
number of unauthorized persons accepted to total number 
of un-authorized persons. 

• False Rejection Rate (FRR): It is the ratio between the 
number of authorized persons rejected to total number of 
authorized persons in the database. 

• Total Success Rate (TSR) or Recognition Rate (RR) [17]: 
It is the ratio between number of persons recognized 
correctly to total number of persons in database. 

• Equal Error Rate (EER) : It is a point at which the value of 
FRR is same as FAR value. 

• Error Rate (ER): It is the redundancy associated with RR 
and given by ER= 1-RR. 

B. Block Diagram 

Explanation for the proposed DE - DS FR model shown in 
Figure 1 is as follows. 

1 ) Datsbases 

The performance of proposed work is evaluated on 
Japanese Female Facial Expression (JAFFE), and Olivetti 
Research Eaboratory (ORE) face databases. The JAFFE [33], 
database consists of 10 Japanese female subjects with 20 
images per subject of 7 facial expressions. The horizontal and 
vertical resolution of each image is 100 dpi (dots per inch) 
with 8 bits depth and the size is 256*256. JAFFE database 
samples of a person with different expressions are as shown in 
Figure 2. 

The ORE face database [34], has 40 different subjects each 
with 10 images. The images captured at different times with 
varying lighting intensities. In addition, facial expression 
variations such as opening or closing of eyes, smiling or non- 
smiling face images and wearing or without wearing glasses 
facial images are also present in the data base. All the subjects 
are in up-right, frontal position and the images are acquired 
against a dark homogeneous background. All images are in 
JPEG format with each image of size 92*112 size and 
represented by 8 -bit grey levels. The horizontal and vertical 
resolution of each image is 96 dpi. Figure 3 shows the sample 
of images of different persons in ORE face database. 




Figure 1. Block diagram of the proposed DE - DS FR model 



2 ) Pre-processing 

To make appropriate for feature extraction, the database 
and test images are refined in pre-processing. The gray scale 
images are converted to binary before applying following two 
steps. 

a) Face cropping 

The details such background and any redundant part in 
input image has no impact in person identification. Hence, 
more precise portion of face is elicited in cropping and then it 
is divided into two parts. Each part is scanned horizontally and 
vertically to identify and remove redundant area in the image. 

b) Image resizing 

The dimension of all cropped images may not be same; 
hence all cropped images are uniformly resized to 100*100. 
Figure 4 shows the effect of pre-processing on ORL data base 
face image. 




Figure 2: JAFFE database image samples of a person 




Figure 3: ORL database image samples of three persons 
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EDBC 135 1 = 101111000 (376), EDBC 18(U = 111101100 (492), 
EDBC 225 ’i = 111011100 (476), EDBC 270 ,i = 101011000 (344), 
EDBC 3 i 5? i = 110010111 (407). Figure 6 depicts the complete 
process of EDBC features extraction. 



Figure 4: Pre-processing effect (a) Original image (b) Pre-processed image 

3) Feature extraction 

The vital facial features of each image are elicited using 
Extended Directional Binary codes (EDBC) and Singular 
Value Decomposition (SVD) separately and fused. In addition 
with existing DBC (4-DBC), which extracts four directional 
information, eight directional infromation is extracted and 
named as EDBC ( 8 -DBC). The one level DWT with Haar 
wavelet [35], is used to decompose preprocessed image. 
Output of DWT contains approximation and detail 
components such as LL, HL, LH, HH sub bands, where LL is 
approximate, and HL, LH, HH are horizontal, vertical, 
diagonal sub bands respectively. The output size of each sub 
band is reduced to half compared with input size, which makes 
processing time faster. LL sub band is used to extract features 
as it contains more appropriate details compared to remaining 
three sub bands. 

EDBC is applied on LL sub band of DWT output to extract 
the eight directional details. EDBC determines spatial 
relationship in all eight directions associated with eight 
different neighborhood pixels. The LL band of size 50*50 is 
partitioned into 100 cells each with 5*5 size. For each cell, 
first order derivative I a d is computed, where a- 0°, 45°, 90°, 
135°, 180°, 225°, 270°, 315° and d is the distance between the 
given point and its neighboring point. The eight directional 
derivatives (I ad ) at a point Zy are calculated using Equations 
from 1 to 8 . Equation 9 converts derivative form to binary 
form. Only central 9 values out of 25 values of a 5*5 matrix of 
each cell, forms 9 bit binary code as shown in Figure 5 and the 
sequence of reading is given in Equation 10. 















Ii-l.j-1 


tu 


Ii-lj+l 






Iy-i 


h 


Ii,j+1 






Ii+lj-1 


1+1, j 


Ii+lj+l 















Figure 5. A 5*5 cell with bolded 5*5 central mask 

The computed EDBCs along 0°, 45°, 90°, 135°, 180°, 225°, 
270°, and 315° directions with corresponding decimal values 



(inside bracket) are given by: EDBC 0 ,i : 
EDBC 45 1 = 110001101 (397), EDBC 90 1 



110000110 (390), 
100010111 (279), 



I 0,d( Z i,j) l(^i,j+d) 


(i) 


1 45,d( Z ij) d,j+d) 


(2) 


I 90,d( Z i,j) ^( Z i,j) d,j) 


(3) 


I 135,d( Z i,j) ^( Z i,j) d,j— d) 


(4) 


I 180,d( z i,j) ^( Z i,j) d) 


(5) 


I 225,d( Z i,j) ^( Z i,j) l(^i+d,j— d ) 


(6) 


I 270,d( Z i,j) ^( Z ij) l(^i+d,j) 


(7) 


I 315,d( Z i,j) l(^i+d,j+d) 


(8) 


f r V I a,d ( Z ) — 0 

1 0 a,d ( z ) J = | i T ' 

U* a,d ( z ) > 0 


(9) 



EDBC a4 (Zij) = { f (l' a ,d(Zij)), f (I a ,d(Zi-d,j)), f (I’a.dfe-dj-d))’ 

f 0 a,d( Z i.j-d))’ 1 (1 a,d( z i+d,j-d))> f (I a,d(^i+d,j))> f (I a,d(^i+d,j+d)X 

f (I'a.dfe.j+d))’ f (l’a,d( z i-d,j+d)) } ( 10 ) 

On the other side, Discrete Cosine Transform (DCT) is 
applied on the preprocessed image to produce energy 
compacted image. The basis function of DCT is real and easy 
to compute. SVD is computed on DCT output image, which 
retains textual features. SVD is a process of factorizing a 
given matrix in to three parts, consisting of Left Singular 
Vectors (LSV), singular values and Right Singular Vectors 
(RSV). SVD [36] of a matrix A with M*N dimension is given 
in Equation 1 1 . 



A= USV 1 



(ID 



Where, U (LSV) is a M*M orthogonal matrix contains 
Eigen vectors as columns of AA T , V (RSV) is a N*N 
orthogonal matrix whose columns are the Eigen vectors of 
A t A, and S is a M*N diagonal matrix contains singular values 
in decreasing magnitudes till, r = rank (A). 
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Figure 6. EDBC along 8- directions 
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The diagonal elements of S are square roots of the Eigen 
values of A T A named as singular values of A and are always 
real in nature. Even matrices U and V are also real, when 
matrix A is real. Consider an image of JAFFE database with 
size 256*256 as shown in Figure 7. Resized matrix of the 
same image of size 10*10 is represented in matrix Equation 12 
and corresponding ten singular values of matrix A are 
[1106.27, 247.48, 213.11, 94.56, 42.80, 32.39, 25.6, 12.05, 
10.64, 5.22]. SVD is applied on the DCT output of size 
100*100 to generate 100 singular values; these are fused with 
100 EDBC features to yield final 100 features. 

4 ) Matching 

Final step of the proposed work is to produce the results. 
The system has to verify and decide whether test image is in 
database or not, and also to validate correct recognition of a 
person. Euclidean Distance (ED) measure is used to generate 
the result. Let I(pi,qi) and J(p 2 ,q 2 ) are any two points on a 
given 2D plane, then ED is calculated by Equation 13. The ED 
value between feature vectors of a test image and feature 
vectors of all images in the database are computed. Image 
number and the corresponding Person in the database are 
identified based on minimum ED value. Finally, ED values of 
all images are compared with entire threshold range varied 
from 0 to 1 in steps of 0.1 or still precise. 

RR is calculated based on correct matching count of 
persons. Correct matching count is incremented when ED 
value is less than the threshold and also the queried person 
should be same as database person. On the other hand, 
mismatch occurs if the ED value is less than the threshold and 
person from the database and test image of a person are 
different. FRR is a count incremented when the ED value is 
greater than threshold, infers that the person in the database is 
falsely rejected. 




Figure 7: JAFFE database image 
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For FAR calculation, the test images are taken from out of 
database and ED value is compared for all images of database. 
FAR count is incremented by one when the ED value is less 
than threshold, which conveys that even the image is not there 
in database, but falsely accepting. Lastly, if the ED value is 
higher than threshold, it is considered as mismatch and to 
conclude that the system is correctly rejecting. 

IV. ALGORITHM 

Problem definition: Identification of a person is made by 
using EDBC and SVD based Face Recognition system with 
DWT & DCT (DE - DS FR). The objectives are as follows: 

• To improve the RR 

• To reduce FRR and FAR. 

The proposed DE-DS FR model algorithm is shown in 
Table I. 

V. PERFORMANCE ANALYSIS 

The proposed algorithm is tested on JAFFE and ORL face 
database. 

A. Performance analysis using JAFFE database 

JAFFE database consists of 10 persons with 20 images per 
person. For 8 persons each with 18 images are considered for 
database and remaining images are used for testing. The 
experiments are conducted to observe performance of EDBC 
with SVD (without DCT) by varying number of images used 
for database. Table II compares %FRR, %FAR and %RR for 
12, 15 and 18 images used for database. As the number of 
trained images are increasing, the % RR is also increasing e.g. 
% RR is 37.5, 50 and 62.5 at 0.4 threshold and reaches to 
maximum of 87.5, 100 and 100 respectively for the 3 cases. 

The performance on JAFFE database is tested for both 
DBC and EDBC with SVD (without DCT). The Table III 
compares %FRR, %FAR and %RR for different values of 
threshold on JAFFE database. The %RR is 25 % higher in 
EDBC with SVD compared with DBC with SVD at 0.5 
threshold value. 

TABLE I. ALGORITHM OF PROPOSED DE-DS FR MODEL 



Input: Database and Probe images of face 
Output: Identification / Refutation of a person. 

1. Preprocessing involves resizing an image to 100*100 after 
face cropping. 

2. Approximate band of 50*50 size is considered by applying 
DWT. 

3. Approximate band is divided into 100 cells with each cell 
of 5*5 size. 

4. Computation of 8 - directional derivatives for each cell. 

5. 9-bit code generated from each cell along 8- directions & 
converted into its decimal equivalent. 

6. One decimal value per cell in all directions correspondingly 
averaged to constitute 100 features. 

7. DCT of preprocessed image is performed. 

8. SVD is computed on DCT output & fused with 100 co- 
efficient features of step 6. 

9. Euclidean Distance between database and test image 
feature vectors is computed. 

10. Matching is decided for an image with minimum distance. 
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TABLE II. PERFORMANCE OF EDBC WITH SVD ON JAFFE DATABASE (WITHOUT DCT) 



Threshold 


Case-1 : 12 images database 


Case-2 : 


15 images 


database 


Case-3 : 


18 images 


database 


% FRR 


% FAR 


% RR 


% FRR 


% FAR 


% RR 


% FRR 


% FAR 


% RR 


0.0 


100 


0 


0 


100 


0 


0 


100 


0 


0 


0.1 


100 


0 


0 


100 


0 


0 


100 


0 


0 


0.2 


100 


0 


0 


100 


0 


0 


100 


0 


0 


0.3 


75 


0 


25 


100 


0 


0 


100 


0 


0 


0.4 


62.5 


0 


37.5 


50 


0 


50 


37.5 


0 


62.5 


0.5 


37.5 


50 


62.5 


0 


50 


100 


0 


50 


100 


0.6 


0 


50 


87.5 


0 


50 


100 


0 


50 


100 


0.7 


0 


100 


87.5 


0 


100 


100 


0 


100 


100 


0.8 


0 


100 


87.5 


0 


100 


100 


0 


100 


100 


0.9 


0 


100 


87.5 


0 


100 


100 


0 


100 


100 


1.0 


0 


100 


87.5 


0 


100 


100 


0 


100 


100 



TABLE III. PERFORMANCE COMPARISON ON JAFFE DATABASE 
(WITHOUT DCT) 



Thres 

hold 


DBC with SVD 


EDBC with SVD 


% 

FRR 


% 

FAR 


% 

RR 


% 

FRR 


% 

FAR 


% 

RR 


0.0 


100 


0 


0 


100 


0 


0 


0.1 


100 


0 


0 


100 


0 


0 


0.2 


100 


0 


0 


100 


0 


0 


0.3 


87.5 


0 


12.5 


100 


0 


0 


0.4 


62.5 


0 


37.5 


37.5 


0 


62.5 


0.5 


25 


0 


75 


0 


50 


100 


0.6 


0 


100 


100 


0 


50 


100 


0.7 


0 


100 


100 


0 


100 


100 


0.8 


0 


100 


100 


0 


100 


100 


0.9 


0 


100 


100 


0 


100 


100 


1.0 


0 


100 


100 


0 


100 


100 



The plot of FAR and FRR versus threshold on JAFFE 
database using SVD with DBC and EDBC (without DCT) is in 
Figure 8 and 9 respectively. Optimum %RR on JAFFE 
database is obtained at EER i.e. when both %FRR and %FRR 
are same. 




Figure 8: FAR & FRR versus threshold on JAFFE database using DBC 
with SVD (Without DCT) 



TABLE IV. 

PERFORMANCE COMPARISON OF SVD WITH DBC & 

EDBC ON JAFFE DATABASE (WITHOUT DCT) 



Method 


EER 


Optimum 
% RR 


Maximum 

%RR 


DBC with SVD 


20 


80 


100 


EDBC with SVD 


22 


87.5 


100 




Figure 9: FAR & FRR versus threshold on JAFFE database 
using EDBC with SVD (Without DCT) 

Table IV compares EER, Optimum %RR and Maximum % 
RR for DBC with SVD and EDBC with SVD. Individual 
effect of EDBC, SVD (with DCT) and the combination of 
both for different values of threshold is observed. Table V 
depicts the results of EDBC with DWT, DCT-SVD 
combination and features fusion of both combinations. Results 
reveal that the fusion has improved the performance against to 
individual performances. 



TABLE V. PERFORMANCE COMPARISON OF SVD WITH DBC & EDBC ON JAFFE DATABASE (WITH DCT) 



Threshold 


EDBC 


SVD 


EDBC with SVD 


% FRR 


% FAR 


% RR 


% FRR 


% FAR 


% RR 


% FRR 


% FAR 


% RR 


0.0 


100 


0 


0 


100 


0 


0 


100 


0 


0 


0.1 


100 


0 


0 


25 


0 


75 


100 


0 


0 


0.2 


100 


0 


0 


0 


100 


87.5 


25 


50 


75 


0.3 


100 


0 


0 


0 


100 


87.5 


0 


100 


100 


0.4 


37.5 


0 


62.5 


0 


100 


87.5 


0 


100 


100 


0.5 


0 


50 


100 


0 


100 


87.5 


0 


100 


100 


0.6 


0 


50 


100 


0 


100 


87.5 


0 


100 


100 


0.7 


0 


100 


100 


0 


100 


87.5 


0 


100 


100 


0.8 


0 


100 


100 


0 


100 


87.5 


0 


100 


100 


0.9 


0 


100 


100 


0 


100 


87.5 


0 


100 


100 


1.0 


0 


100 


100 


0 


100 


87.5 


0 


100 


100 
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Compared to other approaches such as [37], [38], and 
[39], our proposed algorithm has better %RR on JAFFE 
database as in Table VI. 



TABLE VI. RECOGNITION RATE COMPARISON 
ON JAFFE DATABASE (WITH DCT) 



Method 


Maximum % RR 


DSNGE +SVM [37] 


68.85 


PCA based GA [38] 


96 


GABOR - SVD [39] 


98 


Proposed DE-DS FR model 


100 



B. Performance analysis using ORL database 

The performance is tested on ORL face database, 
consisting of 40 persons with 10 images per person. For 36 
persons each with 9 images is considered for database and 
remaining images are used for testing. Table VII compares 
%FRR, %FAR and %RR using both DBC and EDBC with 
SVD (without DCT) for 0 to 1 threshold variation. The 
%RR attains maximum of 82.85 and 88.57 for both DBC 
and EDBC with SVD respectively. The plot of FAR and 
FRR versus threshold on ORL face database for both DBC 
and EDBC with SVD is in Figure 10 and 11 respectively. 
Table VIII compares EER, optimum %RR and Maximum % 
RR using SVD with DBC and EDBC (without DCT). 





Figure 11: FAR & FRR versus threshold on ORL database 
using EDBC with SVD (Without DCT) 



TABLE VII. PERFORMANCE COMPARISON OF SVD WITH 
DBC & EDBC ON ORL DATABASE (WITHOUT DCT) 



Thres 

hold 


DBC with SVD 


EDBC with SVD 


% 

FRR 


% 

FAR 


% RR 


% 

FRR 


% 

FAR 


% RR 


0.0 


100 


0 


0 


100 


0 


0 


0.1 


100 


0 


0 


100 


0 


0 


0.2 


100 


0 


0 


100 


0 


0 


0.3 


94.28 


0 


5.71 


94.28 


0 


5.71 


0.4 


82.85 


0 


17.14 


80 


0 


20 


0.5 


34.28 


0 


60 


17.14 


20 


74.28 


0.6 


0 


80 


82.85 


0 


100 


88.57 


0.7 


0 


100 


82.85 


0 


100 


88.57 


0.8 


0 


100 


82.85 


0 


100 


88.57 


0.9 


0 


100 


82.85 


0 


100 


88.57 


1.0 


0 


100 


82.85 


0 


100 


88.57 



TABLE VIII. PERFORMANCE COMPARISON ON ORL 
DATABASE (WITHOUT DCT) 



Method 


EER 


Optimum 

%RR 


Maximum 

%RR 


DBC with SVD 


25 


60 


82.5 


EDBC with SVD 


20 


74.28 


88.57 



Figure 10: FAR & FRR versus threshold on ORL database using 
DBC with SVD (Without DCT ) 



TABLE IX. PERFORMANCE COMPARISON OF SVD WITH DBC & EDBC ON ORL DATABASE (WITH DCT) 



Threshold 


EDBC 


SVD 


EDBC with SVD 


% FRR 


% FAR 


% RR 


% FRR 


% FAR 


% RR 


% FRR 


% FAR 


% RR 


0.0 


100 


0 


0 


100 


0 


0 


100 


0 


0 


0.1 


100 


0 


0 


34.28 


0 


57.14 


100 


0 


0 


0.2 


100 


0 


0 


0 


80 


74.28 


100 


0 


0 


0.3 


94.28 


0 


5.71 


0 


100 


74.28 


94.44 


0 


5.56 


0.4 


80 


0 


20 


0 


100 


74.28 


80.56 


0 


19.44 


0.5 


17.14 


20 


74.28 


0 


100 


74.28 


19.44 


25 


72.22 


0.6 


0 


100 


88.57 


0 


100 


74.28 


0 


100 


88.89 


0.7 


0 


100 


88.57 


0 


100 


74.28 


0 


100 


88.89 


0.8 


0 


100 


88.57 


0 


100 


74.28 


0 


100 


88.89 


0.9 


0 


100 


88.57 


0 


100 


74.28 


0 


100 


88.89 


1.0 


0 


100 


88.57 


0 


100 


74.28 


0 


100 


88.89 
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Further the algorithm is tested to observe rationale of 
individual components such as DWT-EDBC and DCT-SVD 
separately, then the combined effect of both on ORL face 
database. The results are in Table IX, which justifies that 
features fusion yields improved performance. Table X 
compares the maximum %RR with other algorithms such as 
[40], [41] and the Error Rate is compared in Table XI with 
[42]. It is clear that the proposed algorithm has superior 
results compared to existing algorithms. 

The major reasons for results improvement are; the LL 
band coefficients of DWT have significant information and 
compressed version of the original image. Using texture 
features of LL band which are derived using EDBC, yields 
less recognition rate, as the edge information of the original 
image is neglected. The SVD is applied on DCT compressed 
coefficients results in less recognition rate, but SVD reduces 
the effect of noise due to illumination variation [39]. 
Singular values of SVD are invariant to translation and 
rotation. The arithmetic addition of EDBC and SVD is used 
in fusion technique to generate final set of features has 
better performance compared to EDBC and SVD alone. 



TABLE X. RECOGNITION RATE COMPARISON ON ORL 
DATABASE (WITH DCT) 



Method 


Maximum % RR 


Local Features + Bayesian classifier [40] 


78.75 


DWT with FKNN classifier [41] 


87 


Proposed DE-DS FR model 


88.89 



TABLE XI. COMPARISON OF % ERROR RATE ON ORL DATABASE 
(WITH DCT) 



Method 


% Error Rate 


Schur faces [42] 


15 


Proposed DE-DS FR model 


11.11 



VI. CONCLUSION 



Verification and recognition of persons using biometrics 
have no ideal solutions in real time. An efficient method 
using DWT-EDBC and DCT-SVD combination for face 
recognition is proposed. In preprocessing scanning and 
cropping of the vital part of face is performed and then it is 
resized to 100*100. The LL band of DWT with size 50*50 
is used. EDBC is applied on LL band to elicit one hundred 
features. Concurrently the preprocessed image is 
transformed using DCT and SVD is applied on DCT output 
to generate another one hundred features. Finally both 
EDBC and SVD features are fused to yield one hundred 
distinct features. The matching results based on ED measure 
is are computed. The performance on JAFFE and ORL face 
datasets are better compared with other existing algorithms. 
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Abstract — This paper evaluates whether the variable -length 
moving average (VMA) as input in NARX outperform the 
univariate exchange rate forecasting performance. Six major 
rates of monthly data from January 1975 to April 2014 
(USDAUD, USDCAD, USDEUR, USDGBP, USDJPY and 
USDCHF) are used to test the proposed model with a (1,5,0) 
VMA rule. 

We evaluate that the VMA can be used as input for NARX model 
and the forecasting accuracy is outperform the NAR univariate 
model with 19.97% improvement on D stat and 3.17% 
improvement on MSE. 

Keywords — forecasting , major exchange rates , VMA , NAR, 
NARX 



I. Introduction 

Exchange rate forecasting has proved to be predictable 
using univariate model and it gives a good forecast 
accuracy[l]. In some cases, univariate specifications are 
limited that the market could be efficient and it can only be 
driven from outside indicators; the available time series are too 
short for significant technical analysis with the chosen 
forecasting horizon[2], and for some exchange rates, univariate 
model does not provide a good forecast [3]. 

Technical analysis studies patterns in historical exchange 
rate series those are generated by time-to-time market 
activities, which aim to predict future market movements. The 
key information used by technical analyst is volume and price. 
Two technical trading rules were firstly tested extensively on 
DJIA during period of 1897 to 1986[4]. These two technical 
indicators are variable moving average (VMA) and trading 
range break-out. The additional research on this method is 
extensively done for many stock index data or single stock data 
[5]— [16] . VMA on exchange rate forecasting is used to filter 
the signal [17]. The use of VMA on exchange rates forecasting 
is still rare. 

Nonlinear Autoregressive with exogenous Inputs (NARX) 
that has been used for exchange rate forecasting with good 
result [18]-[20] can accommodate the VMA indicator as an 
input. 



II. Related Work 

The study of Moving Average with trading range 
breakouts has predictive ability for financial market [4], [6] 
with long sufficient data [15]. 

The profitability of VMA method has been tested on a 
costly trading environment like in UK data[6]. 

Recent literature on VMA has concluded that this method 
is technically more successful in the emerging market of 
Malaysia, Thailand and Taiwan but less powerfull in more 
developed countries like Hongkong and Japan [7]. [9]. 

The recent research shows that this method is not good as 
its historical sample tested out of sample over period of 1987 to 
2011 and [16]. 



III. VMA-NARX Method 



A VMA consists of comparison of two simple moving 
average, a longer- and a shorter- periods. Signals are 
generated by the short-term moving average crossing above or 
below the longer-term moving average [7], [15]. The rule of 
moving average period often uses the convention of 5-20 
periods, 20-60 periods and 100-200 periods to detect short-, 
medium-, and long-term cycles of price movements, 
respectively. It depends on the economics circumstances and 
investors’ behaviors differ [16]. The short- and long-period of 
moving avergae are described below [8]. 



2s-i f 2r=i 

Sp _ ; Lp = — — 



Where R i t is daily return in short-period of S, and R i t _i is 
the return in long -period of L. 



A trading range break-out (TRB) is used to filter the buy 
(sell) signal when the price penetrates the resistance level 
(local maximum or minimum). A buy (sell) signal is generated 
when price rises above (below) the resistance area. A 50, 150, 
200 days of maximum (minimum) is often used as the 
resistance level and with or without 1 % band break-out [4] . 



An important class of discrete-time nonlinear system is the 
NARX model and this model is well suited for modelling 
nonlinear system [21]. 

y(t) =f(u(t-n u ), — l),u(t),y(t-nj,) y(t - 1) (1) 
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Where u(t) and y(t) represent input and output of the 
network at time t, n u and n y are the input and output order and / 
is a nonlinear function. 

VMA can be used as input to an artificial neural network 
especially NARX [22] like other trading indicators. Technical 
indicator is still valuable to trading decision because at least 
90% of the respondent placed some weight on technical 
analysis for decisioon making [23]. 

The research goal were sets whether the VMA as input in 
NARX outperform the univariate exchange rate forecasting 
performance. The method is described as follows: 

1. Compute the VMA of monthly six major exchange 
rates 

2. Use the VMA value as external input for the NARX 
model 

3. Calculate the forecasting performance 

4. Compare with NAR forecasting performance from 
previous work 

IV. Experiment and Results 



A. Data 

The exchange rates data used here are monthly six major 
exchange rates from January 1975 until April 2014. The major 
exchange rates used here are USDAUD, USDCAD, USDJPY, 
USDGBP, USDEUR and USDCHF which are the major 
exchange rates in the foreign exchange market and 65.2% of 
exchange market liquidity in April 2013 [24]. Each data 
contains 472 records which is divided into 80% (377 data) for 
training, 5% (24 data) for validation, and 15% (71 data) for 
testing. This data partition is similar to[25]-[28]. 

The external input or external variable for NARX is 
exchange rate that has been processed with VMA. 

B. Methodology 

The selected variable moving average for short-, long- 
periods and band in this exchange rates forecasting is (1,5,0) 
similar to [17]. The short period is 1 because it is a monthly 
data and it is equal to about 24 daily data. The long periods is 
5 and since it is a monthly data it is long enough and equal to 
about 120 daily data. It is still in the range of its original rule 
[4] and the consensus value [16]. We do not test trading range 
breakout since there is no major differences in efficacy of 
trading model in presence of trading band [8]. 

Once the data calculated with VMA (1,5,0) then it is 
processed by NARX algorithm with preprocessing first, 
network training, network testing and performance 
measurement as shown in Fig 1 . 

C. Forecast Measure 

In order to evaluate the forecast accuracy of the models, 
two forecast error measurements are used: Mean Square Error 
(MSE) and directional statistics (D stat ). MSE is defined as 
follows [29]: 



MSE = V — 

t = i n 



( 2 ) 




Fig. 1 . Forecasting Process 

D stat is defined as follows [3]: 

= ft at *100% 

N '=i (3) 

A 

where at=l if (x t+1 - x t )(x t +i -x t )>0; 

otherwise 0. 

D stat is more preferable in financial instruments forecasting 
because it gives the correctness of gradient prediction [3]. 

D. Result 

The proposed method is tested by using 10 experiments for 
each exchange rate pair. NAR algorithm result is used to get 
the univariate exchange rate forecasting and collected from the 
previous study [30] and used as a benchmark data. The best 
experiment results is presented in Table I for the MSE 
parameter. 



TABLE I. Forecasting performance VMA NARX on MSE 



Method 


USD 

AUD 


USD 

CAD 


USD 

EUR 


USD 

GBP 


USD 

JPY 


USD 

CHF 


NAR 


3,587 J 


U38 1 


6,670 2 


3,260 2 


7.565 


1,375 ! 


VMA-NARX 


3,239' 


U10 1 


6,520 2 


3,259 2 


7.229 


1,373 ! 


MSE Dec.(%) 


-9.70 


-2.46 


-2.25 


-0.03 


-4.45 


-0.15 



1 = * 10 ‘ 3 ; 2 =* 10 4 



The VMA-NARX on MSE parameter gives the best 
derease of MSE for USDAUD (-9.70%) and the worst for 
USDGBP (-0.03%) with average of -3.17%. 

The results for D stat accuracy parameter is more promising 
in VMA-NARX model with the best accuracy improvement for 
USDGBP (37.84%) and the worst is USDEUR (4.66%) and the 
average of 19.97% for all exchange rates. The result is 
summarized in Table II. 
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TABLE II. Forecasting performance VMA NARX on D stat 



Method 


USD 

AUD 


USD 

CAD 


USD 

EUR 


USD 

GBP 


USD 

JPY 


USD 

CHF 


NAR (%) 


57.75 


47.89 


60.56 


52.11 


47.89 


57.75 


VMA-NARX 

(%) 


66.2 


60.56 


63.38 


71.83 


60.56 


63.38 


Dstat Inc. (%) 


14.63 


26.46 


4.66 


37.84 


26.46 


9.75 



The summary result for proposed method showed that 
VMA-NARX gives improvement for MSE and D stat parameters 
for all exchange rates forecasting. With the average 
improvements of 3.17% for MSE and 19.97% for D stat , it can 
be conclude that VMA can be used as input for NARX 
algorithm and the performance of proposed method outperform 
the NAR method for six major exchange rates pairs. 

V. Conclusions. 

The VMA could be applied as external input for a NARX 
algorithm and gives a better result compared to a univariate 
NAR algorithm for six major exchange rates. The proposed 
model gives 19.97% improvement on average for D stat and 
3.17% improvement on average for MSE. 

The accuracy performance could be improved by adding more 
variables to the models or combining with other models for 
further research. 
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Abstract — For more than a decade, many solutions and detection 
mechanisms have been proposed to prevent web applications 
from SQL Injection attacks. However, according to the 2013 
OWASP 10, SQL Injection is still a top threat and attackers can 
find evasive ways to exploit this vulnerability. We implemented 
the proposed database level Database Users Privilege Model of 

[6] . Our findings suggest that by implementing the model, it does 
not detect SQL Injection attacks but with the necessary 
recommendations, it can prevent the attacker from fully 
compromising the database and the database server. 

Keywords-SQL Injection, Database Users Privilege Model, 
OWASP, Database Security, Web Application 

I. Introduction 

Injection is still number 1 in the OWASP top 10 list of the 
most critical web application security flaws. Injection flaws 
occur when an application sends untrusted data to an interpreter. 
Injection flaws are very prevalent, particularly in legacy code. 
They are often found in SQL, LDAP, Xpath, or NoSQL 
queries; OS commands; XML parsers, SMTP Headers, 
program arguments, etc. Injection flaws are easy to discover 
when examining code, but frequently hard to discover via 
testing [1] [13]. A successful injection when the database is not 
well defended, an attacker can gain full access to the database 
and even the database server itself. 

In a SQL Injection, the root cause of its vulnerabilities is 
insufficient input validation and the solution for eliminating 
these vulnerabilities is to apply defensive coding practices 

[7] [11]. Solutions have been proposed and recommended 
mostly in the code level by creating parameterized interface 
and / or escaping special characters of the unsafe input data 
[1] [2] [4] [7] [14]. Scanners and fuzzers systems like Intrusion 
Detection Systems that detects possible SQL Injection flaws 
that are in web applications [3] [4] [5] [7] [8] [9] [1 1] [12]. While 
these code level solutions and detection systems may make 
web applications safer, attackers can still find evasive ways to 
attack. It is also best to be able have another level of protection 
in case of a successful SQL injection. A Users Privilege Model 
has been proposed as another level of protection in the database 
level [6]. 

In this paper we implemented a database level protection 
based on [6] to identify the level of protection of the database 
server in an event of a successful SQL Injection. Section 2 
shows the Implementation of the Database Users Privilege 



Model. Section 3 discusses the findings of a successful SQL 
Injection attack in each user privilege. Finally, Section 4 
concludes the paper. 

II. Implementation OF THE Database Users 
Privilege Model 

Reference [6] proposed the Database Users Privilege Model 
as shown in table 1 . 



TABLE 1 . Database Users Privilege Model 



Username 


Privileges 


viewer_user 


SELECT 


editor_user 


SELECT, INSERT, UPDATE, DELETE 


structure_user 


CREATE, ALTER, DROP, EXECUTE 


super_user 


CREATE, GRANT, RELOAD, CREATE USER 



Table 2 shows the implementation of the above model in a 
mysql database (db). The asterisk (*) represents all. 



TABLE II. Database Users Privilege Model Implementation 



Username 


Privileges 


viewer_user 


CREATE USER 'viewer_user'@ 'localhost' 
IDENTIFIED BY 'viewer_pass'; 




GRANT SELECT ON db.* ON 
' viewer_user' @ 'localhost' ; 


editor_user 


CREATE USER 'editor_user'@ 'localhost' 
IDENTIFIED BY 'editor_pass'; 




GRANT SELECT, INSERT, UPDATE, DELETE 
ON db.* ON 'editor_user'@ 'localhost'; 


structure_user 


CREATE USER 'structure_user'@ 'localhost' 
IDENTIFIED BY 'structure_pass'; 




GRANT CREATE, ALTER, DROP, EXECUTE ON 
db.* ON 'structure_user'@ 'localhost'; 


super_user 


CREATE USER 'super_user'@ 'localhost' 
IDENTIFIED BY 'super_pass'; 




GRANT CREATE, GRANT, RELOAD, 
CREATEJJSER ON db.* ON 
'super_user'@ 'localhost'; 
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Table 3 shows a simplified PHP Language Connection attack in the login page may also make the attacker any user he 
based on table 2. wants. 



TABLE III. PHP LANGUAGE CONNECTION 

connection.php 

<?php 

class DBConnection { 

// db user variables 

private $viewer_user = “viewer_user”; 
private $viewer_pass = “viewer_pass”; 
private $editor_user = “editor_user”; 
private $editor_pass = “editor_pass”; 
private $structure_user = “structure_user”; 
private $structure_pass = “structure_pass”; 
private $super_user = “super_user”; 
private $super_pass = “super_pass”; 

// database variable for the database connection 
private $database = “db”; 

//host variable 

private $host = “localhost”; 

//connection variable 
private $con; 

public function db_connect($login_status){ 
switch($login_status) { 

case 1: $user = $viewer_user; $pass = $viewer_pass; break; 
case 2: $user = $editor_user; $pass = $editor_pass; break; 
case 3: $user = $structure_user; $pass = $structure_pass; break; 
case 4: $user = $super_user; $pass = $super_pass; break; 
default: $user = $pass=”” 

} 

$this->con = @mysql_connect($this -> host, $this -> user, $this -> pass)); 

} 

?> 



III. Findings, Discussions & Recommendations 

Results of a successful SQL Injection on the Database 
Users Privilege Model are shown here together with the 
discussions and recommendations 

A typical database for a web application contains the users 
table, and the data tables relevant to the function of a web 
application. 

A. Viewer User 

The “viewer_user” has a privilege of SELECT only. 
Meaning it has the capability of retrieving records from the 
database. 

User Example: A typical example of a user that has this 
privilege is guest in online shop where just goes on viewing the 
public pages. 

Effect: Once a successful SQL Injection attack, it can only 
get records from the database. While it cannot modify or 
change any data, the users table is also available for viewing. 
The attacker can now try to brute force the password and log 
into any user he wants. Even if the attacker doesn’t know or 
cannot brute force the password, a successful SQL Injection 



Recommendation: This type of user must not have a 
SELECT access to the users table. 

B. Editor User 

The “editor_user” has a privilege of SELECT, INSERT, 
UPDATE, and DELETE. Meaning it has the capability of 
adding, updating retrieving, and deleting records from the 
database. 

User Example: A typical example of a user that has this 
privilege is a customer in an online shop where he can modify 
his personal information, make or cancel an order and so on. 

Effect: Once this user logs in or his account is hijacked and 
does a successful SQL Injection attack, it can do all the damage 
to all tables by modifying the content of the tables and even 
deleting all the records. 

Recommendation: This type of user must not have any 
privilege to the users table as well as tables that he has nothing 
to do with. 

C. Structure User 

The “structure_user” has a privilege of CREATE, ALTER, 
DROP, EXECUTE. Meaning it has the capability of creating, 
altering and dropping tables, indexes and views and executing 
stored routines. 

User Example: A typical example of a user that has this 
privilege is the DB admin in an online shop where he can 
modify the structure of the database. 

Effect: Once a successful SQL Injection attack, it can also 
do all the damage to all tables in the database much more than 
the structured user by erasing all the tables and views 

Recommendation: Most of the DB Admin pages are for 
these types of functions that are considered out of reach from 
the public. Caution in implementing the use of this type of 
database connection in web applications. 

D. Super User 

The “super_user” has a privilege of CREATE, GRANT, 
RELOAD, CREATE_USER. Meaning it has the capability of 
creating, tables and indexes. It also has a capability of creating 
new users and can grant privileges to mysql database users and 
can do mysqladmin commands. 

User Example: A typical example of a user that has this 
privilege is also the DB admin in the overall Database Server 
where he can manage the server. 

Effect: Once a successful SQL Injection attack, it can also 
do whatever he wants to do of the database server. 

Recommendation: The functions of this user are for 
managing the DBMS and must be out of reach from the public. 
Caution in implementing the use of this type of database 
connection in web applications. 
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IV. Conclusion 

There are a lot of proposed solutions as well as detection 
mechanisms to combat SQL Injection attacks however the 
attackers still manages to find ways to exploit the injection 
vulnerabilities in web applications. In the 2013 OWASP Top 
10, SQL Injection still ranks as the top vulnerability. The 
implementation of the Database Users Privilege Model of [6] 
with the necessary recommendations shows that it cannot 
detect or fully eliminate SQL Injection attacks but it may 
prevent the attacker from fully compromising the database 
server. 

Web Application developers must still follow the best 
practices to prevent SQL injections attacks. They may 
implement the Database Users Privilege Model of [6] with the 
necessary recommendations to provide another level of security 
of the database and the database server. 
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Abstract — Currently, mobile devices are being widely used by of 
a considerable number of people. The need to be connected 24 
hours a day is becoming a reality, because users need to make 
online purchases, make payments, access social networks, surf the 
Internet, check e-mails and so on. In this context, users with 
mobile devices, especially smartphones, using the Internet to 
connect to specific applications and safety issues may arise 
because, for example, sensitive data may be sent over an insecure 
channel (Internet). The aim of this paper is to present an 
overview of current security risks and security solutions related 
to smartphones based on the Android platform. Security risks 
have been divided into five categories, and these risks are 
presented and detailed on the corresponding categories. In 
addition, still stand out security solutions that are currently 
available on Android stores and these solutions can be used to 
eliminate or mitigate the risks. 

I. Introduction 

Currently, mobile devices are undoubtedly present in the 
daily routine of a considerable part of the community. The need 
to be connected twenty four hours a day is apparent because 
users need to shop online, perform payments, access social 
networks, surf the Internet, check emails and so on. By the end 
of this year (2015), the number of smartphones worldwide will 
reach almost 7 billion, close to the number of inhabitants on the 
planet, according to data released by the International 
Telecommunication Union (ITU-T) [10]. The entry of new 
phones will reach 96% worldwide by December, thanks to 
emerging countries, which represent almost 78% of all phones 
in use around the globe. Currently, two of three Internet users 
are located in developing countries [7]. 

In this context, considering that users with mobile devices, 
especially smartphone, utilize the Internet to connect to specific 
applications, security issues may arise, because sensitive data 
may be sent over this unsecure channel. On the current days, 
more than three billion mobile devices suffer all kind of attacks. 
In a recent study conducted by the Chinese company Cheetah 
Mobile, it is possible to observe that in the last 12 months the 
number of malwares for Android devices increased 600% [5]. 
Based on this and other facts, it is imperative to consider 



security measures and mechanisms in the mobile applications 
use and development. 

The main objective of this work is to present, describe and 
categorize the current most relevant security issues and 
solutions for Android-based mobile devices (especially 
Android-based smartphones). To achieve that, five categories 
of security issues are proposed and detailed, and they can be 
used to reason about available security attacks. 

This paper is structured as follows. Section II presents 
relevant basic concepts that help to understand this work. 
Section III introduces the proposed five categories, in which 
security issues are described and detailed. Section IV details 
current solutions for security in Android-based mobile devices. 
Finally, Section V presents the conclusions and future work. 

II. Basic Concepts 
A. Android Architecture 

Android is an operating system that was developed based 
on UNIX and it was designed to be used in mobile devices. 
Usually, applications for this operation system are developed 
using the Java programming language, which are then 
deployed in a specific virtual machine named Dalvik. The 
Android architecture is basically composed by five layers, 
which are described as follows. 

• Applications. The main purpose of this layer is to 
provide the basic functions of the device for high-level 
users. These applications may come available on the 
smartphone (eg e-mail, calendar, web browser and 
organizer) and also be downloaded by users on 
specific online shops. 

• Application Framework. This layer is developed 
mostly in Java, and interfaces with Android 
applications. It provides a set of libraries to access the 
various features of the device such as the graphical 
interface, locator (GPS), persistent database storage on 
the SD card and so on. 
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• Libraries. Composed of several native libraries built 
for specific mobile device architectures and provide 
many different features, such as screen graphical 
access, web rendering engine, internal database 
access, SSL / TLS channel establishment and so on. 

• Android Runtime. Its main role is to support the 
applications execution. Basically, the programs are 
written in Java and then converted into machine code 
called bytecode, an intermediate step between the 
source code and code readable by the hardware. These 
bytecodes can be executed by a specific virtual 
machine called Dalvik. 

• Linux Kernel. Based on the fact that Android uses a 
modified version of the 2.6 Linux kernel, it provides 
some important services, such as security and memory 
management, and also an abstraction layer for 
hardware utilization. 

B. Security Overview 

Security is a topic that has been well discussed by the 
community in past years [20]. Due to that, some security basic 
concepts are being widely used and are briefly highlighted in 
this section. 

Confidentiality. The information should not be made 
available or disclosed to unauthorized individuals, entities or 
processes. For example, confidential information sent over the 
Internet (such as credit card number) should not be accessed by 
unauthorized users. 

Availability. The information should be accessible and 
useable, even considering periods with high demand. For 
example, the loss of connectivity of a relevant server can 
impact the system availability, because this system may 
become unavailable to external users. 

Integrity. Consists of protecting information against non- 
authorized modification without the permission of the 
information owners. 

Authentication. Property that ensures that the information 
was originated by the expected source and that there was not 
any in the communication process by unauthorized users. 

Access control. The main objective is to prevent 
unauthorized users from accessing the system and its 
functionalities. For example, it is important to ensure that 
unauthorized users will not access specific wireless networks. 

III. Commom Issues and vulnerabilities In Android- 
Based Environments 

This section presents and details current security issues and 
vulnerabilities that appear when using mobile application and 
devices in Android-based platforms. For didactic reasons, they 
were classified into five categories: Device Loss, Unsecure 
Communication Environment, Usage of Programs from 
Suspicious Sources, Vulnerabilities in the Mobile Application 
Development and User Misconfiguration. 



A. Device Loss 

When a mobile device is lost, not only the hardware is 
committed, but also its internal information (for example, 
photos, personal documents and settings). For example, a 
malicious user can put a USB cable on a smartphone and have 
access to all the information that is available on the device. 
This section aims to focus on relevant issues of security that 
appears in this context. 

We focus primarily on five security issues: USB -exposure, 
loss of information, blackmail, SIM exposure and exposure of 
internal data. 

In the case of security problem "USB -Exposure", the main 
risk is based on the fact that an unauthorized user can access 
private data of the device by connecting to the USB port of 
your smartphone. Based on this, this user will have access to 
private photos and data files. One possible solution for this 
problem is the usage of a password on the smartphone and to 
use encryption on the smartphone memory card. 

Other relevant issue is "Loss of Information", and is 
relevant because the device owner will lose access to the 
internal data of this device if preventive measures were not 
taken. For example, if the owner did not made a backup of the 
more important files, this owner may no longer have access to 
these files. 

Another relevant security problem related to the device loss 
is related to the fact that a malicious user can require specific 
actions or recompense in order to return the lost device or to 
not use and disseminate its content. This security issue is 
known as "Blackmail". One possible countermeasure for this 
situation is to install specific security applications that are able 
to remotely access the mobile device and take actions such as 
to format it. 

Another source of security issues is based on the fact that 
the SIM card is the subscriber identification module for the 
phone operator. Malicious users can use list SIM cards for a 
diversity of actions. The SIM card protection is a very 
important practice. However, a considerable number of users 
does not put a password on the SIM card because they have 
already put a password on the home screen of the mobile 
device. 

Finally, another risky situation is to have the data card 
violated without user consent. This security issue is referred to 
as "Internal data exposure" and is somehow related to other 
security issues that were previously introduced in this section. 
However, the focus at this point is how to protect the contents 
of the data itself. One possible and commonly used approach is 
the adoption of encryption mechanisms to prevent unauthorized 
users from understanding the stolen data. The main objective is 
to transform the user's mobile data such as email and photos, in 
a set of characters that are very difficult to understand. Even if 
an unauthorized user has access to the encrypted information, 
that person will not be able to retrieve the information in a 
readable format, because this user does not have access to the 
decryption key. 
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B. Unsecure Communication Environment 

Currently, mobile users are having to an increasing number 
of wireless networks (especially Wi-Fi), which offer fast and 
efficient connection to mobile devices. However, eventually it 
becomes an unsafe environment in which the user does not 
know, for example, who is managing the network. Based on 
that, some security issues may arise. For example, an 
unauthorized person can have access to the information that is 
being sent across the network. Several security issues exist in a 
non-secure communication environment. This category focuses 
on security issues related to the use of open Wi-Fi networks. 
More specifically, three points should be analyzed in the 
context of users from accessing open networks: web pages 
capturing, obtaining the credentials of an FTP server and 
reports abduction. 

An user that accesses web pages on the mobile device 
connected to an open Wi-Fi network could suffer an attack 
considering that a malicious user can scan the open network 
and view all traffic that are going through the network using 
computational resources such as a packet capture program. 
Another concern appears if an user uses an application to 
connect to the mail server without using encryption. It this case, 
the attacker is able to access the login details of the user and/or 
device. 

Kidnapping accounts or accessing unauthorized user 
information occurs not only in personal computers, but also in 
mobile devices. In this type of attack, an attacker with a 
computer running a malicious application can access personal 
data or accounts on popular services such as Gmail, Linkedln, 
Yahoo and Facebook. To protect against attacks on data over a 
wireless Internet network, the best way is to avoid access social 
networks, internet banking and other sites that may have access 
to user data. 

Some security mechanisms and strategies can be used to 
avoid security problems. More specifically, the utilization of 
cryptographic algorithms has being widely adopted to secure 
the communication process in this case. Some of these 
algorithms are: 

WEP. This is one of the first encryption algorithms to be 
used. Currently, it is considered not strong and therefore its use 
should be avoided. 

WPA. This algorithm was proposed in order to address 
some of the weaknesses of the WEP mechanism. Considering 
the current state of the art, it is more suitable in comparison to 
WEP. 

WPA -2. This algorithm is similar to WPA, but is 
considered stronger in terms of encryption complexity. It is 
recommended mechanism. 

C. Usage of Programs from Suspicious Sources 

In this category, the focus is on the use of programs and 
applications that are obtained from suspected sources (or, at 
least, not well known sources). Basically, this section present 
attacks that are based on the use of programs with suspicious 
sources. 



Initially, viruses such as malware or trojan horse can come 
disguised as official insurance programs on Google Play stores. 
Despite the claim that Google Bouncer program is a service 
that checks applications for evidence of malware in the 
Android Market, it is known that there are a number of 
malicious software in these stores [9] . 

With the variety of applications for mobile devices, a 
considerable number of users download other programs from 
unofficial stores, and therefore are susceptible to attacks. These 
applications are, for example, malwares that turn users' mobile 
devices into botnets, which is a malicious software that is 
capable of turning the mobile device in a bot (which is also 
known as zombie). When this event occurs, the device can run 
automated tasks over the Internet without the user knowledge 
and authorization. Figure 1 illustrates this scenario. 




Figure 1- Using Programs from suspicious sources 

As mentioned, with the large number of mobile devices in 
the world and the users’ inexperience in their secure utilization, 
these users may download applications without knowing its 
origin, leading to the possibility of Botnet attacks. A Bot-Net is 
a collection of software robots (called bots) that are executed 
automatically and can compromise multiple computers or 
mobile devices that were invaded by malicious programs such 
as worms, Trojan horses and so on. They are executed under a 
common control infrastructure. They serve several purposes, 
including denial of service attacks, often without the user's 
knowledge. Its main objective is to have the largest number of 
infected mobile devices, because the total computational power 
of these devices have enough bandwidth and processor speed to 
cause significant damages in high-scale attacks. 

Another important point is to download programs that are 
not connected to the official stores. In this case, the user does 
not know the origin of the program and thus taking a 
considerable risk of downloading a malware. It is worth noting 
that even downloading a program or application in the official 
stores is always recommended to have an antivirus. The amount 
of programs and applications that are placed daily in official 
stores are analyzed to detect the presence of malwares. More 
specifically, Google recently added a new layer of security to 
the Android Market named Bouncer that scans new and 
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existing applications automatically for malware evidences. 
However, this is not enough, and the user should check the 
source of the program and other related information to 
guarantee the program safety. 

One of the most important relevant risks involved in 
downloading programs from suspicious sources is the malware 
infection. Malware is any type of unwanted software installed 
without user consent. Viruses, worms and Trojans are examples 
of malicious software that are often grouped together and called 
collectively malware. According to the survey conducted by 
Kaspersky Lab [13] Android is still the main target of 
malicious attacks considering all mobile platforms. 98,05 % of 
the malicious programs for mobile devices that were discovered 
in 2013 are directed related to the Android platform, and this 
fact helps to understand the need for security measures 
considering Android-based programs and devices. 

Considering the attacks described in this section, some 
actions can be taken to mitigate or even avoid their occurrence. 
Before downloading any application, it is important to check 
the latest comments from other users about the application. In 
addition, it is also relevant to avoid downloading applications 
or programs outside of official store shops, because the amount 
of malware and viruses that usually comes in other sources are 
comparatively higher. 

D. Vulnerabilities in the Mobile Application Development 

The mobile programs itself can be source of attack risks. 
To develop mobile applications without considering security 
risks and measures is other relevant commom issue currently 
in Android-based programs and devices. Three situations 
appears with relevance in this context: poor security 

knowledge of the development team, indirect attack and 
applications that require advanced permissions on Android. 

Insufficient security knowledge of the application 
development team is a major concern, because many 
applications are placed daily in official stores without 
considering basic security concepts and tools. Usually, 
developers creates simpler applications, without considering 
security requirements in the development process. This fact 
contributes to the existence of security risks and threats. 

Another relevant point is the indirect attack, where 
Android applications have some certificates previously 
recorded by developers, giving them special access and 
privileges within the operating system without going through 
the chain of the certificate validation process. For example, 
Adobe is giving certificates to their applications. This process 
exists to allow other applications to stop using the plug-in 
Adobe Flash Player. One possible situation is that an attacker 
can validate a malicious application with a certificate that 
appears to have been validated by the prerecorded code from 
Adobe, but actually is not. Also, while the Adobe certificate is 
present in the application's certificate chain, the system accepts 
code released by this application and inject in other 
applications. 

Finally, there are risks associated to the applications 
permissions. To install an application, is necessary for the user 



to accept some permissions. For example, the user may 
downloads an application which requires the user permission 
to use the camera, to the GPS and other relevant function. By 
doing that, the user may become more vulnerable, because this 
application can use these permissions for other purposes that 
were not mentioned in the installation process. Figure 2 
presents an example in which an application requests some 
permissions for its installation. 

The acceptance of the permissions request can generate 
unpleasant consequences, such as theft of personal data. The 
problem becomes even more relevant when permissions are 
combined, because to review all the permissions is exhausting, 
and usually the user just accepts the request without performing 
a significant review. A preventive measure is to not install an 
application that requires too much permission, except if they 
are considered essential. Unfortunately, usually it is not 
possible to accept only a subset of these permissions. 



In-app purchases 
S3 Identity 
^ SMS 

Photos/Media/Files 
% Wi-Fi connection information 
0 Device ID & call information 



ACCEPT 



Figure 2- Example of an application requiring Android permissions. 



E. User Mis configuration 

A considerable number of security attacks occurs based on 
the poor security knowledge of the high-level user. This type of 
user is generally not fully aware of what is being done when 
some actions are performed in the mobile device. Based on 
that, this last category is named "User Misconfiguration", and 
focus on some common user misconfigurations that can 
produce relevant security risks. 

The first possible misconfiguration is related to the use of 
Bluetooth and GPS. Many users utilize mobile devices on a 
daily basis and do not care if the Bluetooth or GPS are active 
(even without using them) all the time. If the Bluetooth is 
active, an attacker could obtain sensitive data such as user 
information without the proper consent. In the case of GPS, 
leaving it active can makes possible to an attacker locate the 
current position of the user and use this information against the 
user desire. In addition, to leave this resources turned on all the 
time infers in more energy consumption, which impacts on the 
availability of the device. 
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Another type of attack occurs when there is not any 
password in the main screen of the mobile device. To use a 
mobile device without password in the initial screen causes the 
user to be vulnerable to various attacks. For example, a 
physical attack on the device can occur, in which a malicious 
user with access to the device may access its internal data. 

Finally, another type of attack is based on the fact that it 
is very common for users that need to access the Internet via 
notebook share their Internet connection of his mobile device 
by creating a Wi-Fi network. This procedure is known as a 
portable Wi-Fi router or anchorage. If not configured properly, 
this network is susceptible to attacks, in which an attacker may 
have access non-authorized information. This unsecured share 
of the smartphone Internet with the active router function can 
cause, for example, theft of documents, images and video files, 
installation of malicious software and passwords theft. 

IV. General Solutions for the Security 
Improvement of the android device 

It was highlighted in the previous sections how users are 
susceptible to security risks due to malicious user attacks, 
mobile loss, negligence developers, mobile user configuration 
errors and so on. 

The focus of this section is to introduce relevant initiatives 
that propose general strategies to check and/or improve the 
general status of the device security. 

A. Academic Initiatives 

In [24], the author shows the creation of an application 
called Android Security Test, and one of its main features is to 
identify and classify threats considering its associated risks. 
The goal is to analyze the application settings and user 
applications in order to identify potential threats. The use of 
the system can be considerably simpler; the user must first 
download the application, and then click the "Test Your 
Phone" feature, which collects various information about the 
mobile settings to check the level of the current mobile 
security. Thereafter, applications send the information 
(security level), in the form of a report to the user. 

In addition, the Android Security Testing Initiative [24] 
analyzed the data that was collected from more than 375,000 
users who downloaded the application from November 2010 to 
March 2011. The data collected include information on the 
device, general settings and application permissions. In 
resume, this data shows three important facts. Firstly, most 
users are not using the security mechanisms that are available 
for use on the Android device such as to use password in the 
initial screen, to encrypt the memory card and so on. Secondly, 
due to its open nature, operating system updates and security 
patches cannot be generated and distributed from a single 
source. Finally, the Android security permission mechanism is 
poorly documented and prone to misuse by developers. 

Another interesting work, named LeakMiner [24], consists 
on an application that controls the security of Android mobile 



device. Most applications for PC and Android application deal 
with personal information, such as contacts and SMS 
messages, and the leakage of such information can cause 
significant damage for Android users [25]. Recent approaches 
to Android leak detection are focused on the dynamic analysis 
of the user's location, based on the fact that the leakage 
requires more runtime overhead. LeakMiner focused on the 
detection of confidential information leakage in Android with 
static analysis rather than dynamic approaches. Based on this 
fact, LeakMiner can detect information leaks before 
applications are distributed to users. LeakMiner detects three 
activities in the loss of information. 

B. General solutions available on the Android Market 

Focusing on the Android user’s safety, companies and 
institutes have created several applications to minimize the 
risk of malware attacks and threats in the system. 

In a recent survey conducted by AV-Test [3], an 
independent security institute, malware detection tests were 
performed in 30 applications available on Google Play. 14 of 
these applications were able to detect any kind of malware: 
Antiy AVL [1], Avira [2], Bitdefender [4], GData [8], Ikarus 
[11], Kaspersky [13], Kingsoft [14], KSMobile CMS [15], 
McAfee [17], Norton [18], Qihoo 360 [19], Trend Micro [21] 
and TrustGo [22]. 

Considering the security applications submitted to the AV- 
Test Institute, the applications that stand out were: Avira, 
Bitdefender, Kaspersky, McAfee, Quick Heal Mobile Security 
and Norton [9]. Each one of these has its relevance, but Ikarus, 
Quick Heal Mobile Security and Trend Micro has specific 
features that will be described at this point. 

Ikarus antivirus [12] protects mobile giving reliability to the 
user against malware and Internet applications, yet finds and 
removes viruses, Trojans, spyware, adware and other malware. 
Developers ensure that this security application does not 
consume a lot of battery and memory. An interesting 
advantage of this solution is the automatic monitoring of 
infection [9]. 

The solution developed by Trend Micro, called Trend Micro 
Mobile Security [21], is a security application that has anti- 
phishing, download protection and a specific filter for 
unwanted messages. One of its main features is to block 
malware that may come in the application downloaded from 
Google Play. Another important feature is to protect user 
privacy finding lost phones or tablets, and also performs 
backup of photos and videos with 50 MB of storage in the 
cloud. 

Finally, the Quick Heal Mobile Security application [22] is 
available for the Android operating system and has the 
function of protecting the mobile device against malware and 
also against theft or loss. This solution also has security filters 
that can block numbers and also unknown numbers starting 
and ending with a certain numbers. This security application 
also has the function of notifying the user of programs that can 
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affect the user privacy, and guides this user on settings that can 
improve the security of the device. In addition, the user can 
manage the mobile device via the management portal. 

V. Conclusions and Future Work 

This paper aimed to present and analyze current security 
vulnerabilities and solutions focused on Android-based mobile 
devices. The main objective is to present an updated set of 
security risks, vulnerabilities and applications focused on the 
Android platform. One of the most important strategies 
presented on this work is based on the search of risky settings 
made by the high-level user. To search, identify and mitigate 
these risks is very important because a considerable number of 
attacks are based on the user misconfiguration. 

As future work, we intend to implement a specific 
solution to search for possible user misconfiguration and to 
suggest corrections to the high-level user in order to increase 
the device security level. Thus, this solution will be able to 
reduce the number of vulnerability points caused by poor user 
configuration, increasing the security level of the mobile 
device. 
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Abstract — Over the past few years , we have witnessed an ever- 
increasing rampant battle between network managers & computer 
criminals which led to many developments in the tools used by both 
parties i.e. " legitimate computer professionals & computer 
criminals". Recently , interruption exposure & avoidance systems 
have earned increasing appreciation due to their more frequent 
involvement in the security domain since they allow security 
analysts to make fast reactions and thus take quick decisions to 
prevent any possible damage. This underlines the significance of 
upgraded detection & prevention processes , taking into 
consideration the competitive and dynamic network environment 
these days. Most commercial and government information systems 
are connected through the internet , which will expose them to 
possibilities of spasms. The increasing number of computer users 
marks it precise, ever since computers have become a necessary 
device in our lives. This article covers the IDPS state of art and 
discusses the IDPS challenges . 

Keywords -component ; Network, IDS, IPS, Security, IDS Myths, 
IDS Alerts. 

I. Introduction 

Before we go into the challenges of IDPS we need to 
highlight & understand each term that concerns the IDPS 
starting with IDS, IPS and Threats. 

A. Intrusion Detection System (IDS) 

Intrusion detection can be defined as the method to identify 
“any set of actions that attempt to compromise the integrity, 
confidentiality, or availability of a resource” [1, 2, 3]. IDS 
generally monitors any system held in three phases: the 
collection of audit data, the analysis of collected data and 
finally the release of an alert when a threat is detected [4, 5]. 
IDS can be classified in many different ways; the major 
classifications are: Active and Passive IDS, Network Intrusion 
detection systems (NIDS) and host Intrusion detection systems 
(HIDS), and finally, Knowledge-based (Signature-based) IDS 
and behavior-based (Anomaly-based) IDS [4 - 7] . 



1 ) Active and passive IDS 

An active Intrusion Detection System (IDS) is also known 
as the Intrusion Detection and Prevention System (IDPS) [8]. 
Intrusion Detection and Prevention System (IDPS) is 
constructed to automatically block suspected attacks without 
any interference required by an operator. Intrusion Detection 
and Prevention System (IDPS) has the benefit of providing 
real-time corrective action in response to an attack [2, 4, 8]. 

A passive IDS system is assembled just to monitor and 
analyze network traffic activity and alert an operator to 
possible vulnerabilities and attacks. A passive IDS is not 
capable of performing any protective or corrective functions on 
its own. 

1) Network Intrusion detection systems (NIDS) and Host 
Intrusion detection systems (HIDS) 

A Network Intrusion Detection System (NIDS) usually 
consists of a network appliance (or sensor) with a Network 
Interface Card (NIC) operating in promiscuous modes and 
which has a separate management interface; NIDS normally 
runs at the gateway of a network to capture and examine 
network packets that go through the network hardware 
interface [9, 10]. 

A Host Intrusion Detection System (HIDS) relies on 
operating system audit data to monitor and analyze the events 
generated by programs or users on the host that may cause any 
intrusion attempts on critical servers. HIDS monitors the 
operating system and write data to log files and/or trigger 
alerts. HIDS can only monitor the individual workstations on 
which the HIDS are installed and it cannot monitor the entire 
network [10]. 

2) Knowledge-based (Signature-based) IDS and behavior- 
based (Anomaly -based) IDS 

A knowledge-based (Signature-based) Intrusion Detection 
System (IDS) references a database of previous attack 
signatures and known system vulnerabilities. IDS is 
documented evidence of an intrusion or attack. Each intrusion 
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leaves a footprint behind (e.g., nature of data packets, failed 
attempt to run an application, failed logins, file and folder 
access etc.). These footprints are called signatures and can be 
used to identify and prevent the same attacks in the future. 
Based on these signatures Knowledge-based (Signature-based) 
IDS systems identify intrusion attempts. 

Some of Signature-based IDS disadvantages of are: 
Signature database must be constantly updated and continued, 
and Signature-based Intrusion Detection Systems (IDS) may 
fail to identify matchless attacks [5-8, 11, 12]. 

A Behavior-based (Anomaly -based) Intrusion Detection 
Systems (IDS) references a baseline or learned pattern of 
normal system activity to identify active intrusion attempts. 
Deviations from this baseline or pattern cause an alarm to be 
triggered. Higher false alerts are often related with Behavior- 
based Intrusion Detection Systems (IDS) [11-14]. 

B. Intrusion Prevention System 

Intrusion prevention system (IPS) is a network 
device/software that works similar to a firewall in order to 
recognize and block network threats, by evaluating each packet 
based on the network protocols in the application layer. Since 
Intrusion prevention came out of research on the short comings 
of intrusion detection (Prevention after Detection), so it got the 
same categories or types based on the methodology used 
(Signature base or Anomaly base), and that is the reason why in 
some article they write (IDPS) to refer to IPS [8, 13, 20]. 

2 ) Myths About Intrusion Prevention 

There are a diverse number of superstitions about Intrusion 
Prevention. The IDS marketing system caused most of this 
disinformation, or the unawareness of the proper way to 
operate a well-designed in-line IPS device, and the lack of 
information about its capabilities [21]. Let's examine a few of 
the most common myths: 

a) MYTH I: Each one of the detection & prevention of 
intrusion methods has a single solution. 

At the time being, this is the case. The limitations of 
integral performance have caused to design IPS products with a 
very limiting signature set on board, and a little space to 
increase it without affecting performance seriously. A limited 
number of exploits could be prevented, and these are mostly 
the most serious; it undeniably means that a little space is 
available to the security administrator, to nip out the product 
for his own environment, it means as well that because the 
detection capabilities of the IPS product are so restricted, in 
order to alert on those exploits that aren’t covered, an extra IDS 
product is compulsory overdue it. Intrusion prevention 
products that are designed from bottom to top should be 
capable of providing a far-reaching signature set that allows 
them to operate in either or both IDS and IPS modes. 

The most supple IPS appliance will make it possible to 
initiate in passive IDS mode, while possibly attached to a wide 
port or network tap device in order to permit the administrator 
to determine how successful it can be in detecting an extensive 



variety of exploits and (just as importantly in the case of an 
IPS) how prone it is to false positives. 

When the signature set is adjusted, switching to in-line 
mode and starting to block some or all detected suspicious 
packets and flows, is a simple procedure for the administrator. 
Good Intrusion Prevention is actually an extension of IDS, not 
something completely separate [21]. 

b ) MYTH 2: Intrusion Prevention Is ALL or NOTHING 

As shown in the previous myth, this is obviously false when 

the right kind of appliance is deployed. It is possible to have it 
block only a subsection of exploits, even where an IPS product 
can only operate in an in-line mode, and even as most packets 
are passed through as normal. Behind the IPS device, you’ll 
have a traditional passive-mode IDS, which does most of the 
detection and alerting on suspicious traffic. Obviously, this is 
not a case of all-or-nothing; however, the use of two separate 
tools will undoubtedly cause implementation and management 
problems. This can be enhanced significantly with a device that 
has been designed from the ground up as an IDS as well as an 
IPS. As it has been shown in Myth #1, it is possible to structure 
an appliance that offers both IDS and IPS functions at the same 
time, providing an unassailable migration path from pure 
detection to prevention. Now imagine a device with several 
network ports, with each port capable of supporting SPAN, tap 
or in-line mode. Now you overcome the "all or nothing" 
concept and adopt a truly combined IDS/IPS solution in a 
single box. One pair of ports can be joint to provide an in-line 
prevention feature (say on the private LAN), while another pair 
of ports can be selected as a passive-mode IDS (say on the 
DMZ), providing full detection and alerting features. Now the 
administrator can deploy both technologies using a single 
appliance controlled by a single management interface. The 
management and configuration capabilities are also important 
if we are to avoid the "all or nothing" notion. Previously, 
network sensors have often used a monolithic approach to 
setting intrusion policy and response, with the response being 
fixed according to the signature. Modern IDS/IPS sensors, 
however, should be capable of permitting the administrator to 
adjust the response according to a per-signature or per- 
signature group basis perhaps port scans are given abnormally 
low significance in one particular environment, while IIS Web 
server exploits are blocked and the administrator paged. 
Utilizations are dissimilar, and so the IDS/IP device should 
demonstrate enough flexibility to allow the administrator to 
configure the alerts and responses to his or her specific needs 
[ 21 ]. 

c) MYTH 3: Intrusion Prevention Is TCP Kills/Resets or 
Modify Firewall Rules by IDS 

One can easily see where this myth came from. Quickly 
review the marketing literature of many traditional IDS 
products today, and you may well see claims that they offer 
"Intrusion Prevention” features. Only one kind of prevention 
can be delivered by a passive IDS device i.e. to send TCP 
Resets to both ends of the connection when a doubtful packet 
has been detected, or maybe to reconfigure an external firewall 
or router device to ensure that the rest of the flow is blocked at 
the network borderline. Nevertheless, the problem here is that 
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unless the invader is operating on a 2400 baud modem, the 
probability is that by the time the IDS has detected the 
unlawful packet, raised an alert, and transmitted the TCP 
Resets, and especially by the time the two ends of the 
connection have received the Reset packets and acted on them 
(or the firewall or router has had time to activate new rules to 
block the remainder of the flow), the load of the exploit has 
long since been conveyed. We believe that there are not many 
offenders using 2400 baud modems these days. A true IPS 
device, however, is sitting in-line all the packets have to pass 
through; therefore, as soon as a suspicious packet has been 
detected and before it is passed to the internal interface and on 
to the protected network, it can be aborted. Moreover; as the 
flow has been identified as suspicious, all subsequent packets 
that are part of that session can also be aborted with minimum 
extra processing. It is also possible to send TCP Resets or 
ICMP Unreachable messages to the attacking host [21]. 

d) MYTH 4: Intrusion Prevention Is Losing Control Over 
Intrusion Detection and Response 

To this point, hopefully we have made it clear that this is 
simply not true. The IPS device has been designed properly 
and, in the way of intrusion detection and response, it offers 
more than any basic IDS product. If carefully designed, 
typically involving custom hardware and ASICS for the highest 
levels of performance when operating in in-line mode, the IPS 
device can provide remarkable detection capabilities that are 
equal to the best passive IDS. In addition, only an in-line IDS 
can block all IP/ICMP/TCP/UDP based malicious traffic from 
reaching the intended target hosts with complete reliability 
and/or scrub non-conforming packets to defeat many DoS or 
reconnaissance attempts. Most customers wish to use the IDS 
in the Intrusion Detection Mode (sniffing mode) initially and 
then migrate to the Intrusion Prevention mode (in-line mode) 
[ 21 ]. 

C. Computer Security Threats 

Based on Techopedia; "a threat, in the context of computer 
security refers to anything that has the potential to cause 
serious harm to a computer system. A threat is an action that 
has the potential to cause serious damage. Threats can lead to 
attacks on computer systems, networks and more. Also, they 
can turn potential vulnerabilities into attacks [22]. They can put 
individuals’ computer systems and business computers at risk; 
therefore, vulnerabilities have to be fixed so that attackers 
cannot infiltrate the system and cause damage. Threats can 
include everything from viruses, trojans, and back doors to 
outright attacks from hackers. Generally speaking, the term 
blended threat is more precise, as the majority of threats 
comprise multiple exploits. For example, a hacker might use a 
phishing attack to gain information about a network and break 
into a network [15, 23]. 

1) Trojan: 

Trojan is one of the most intricate threats of all. The Trojan 
family such as Zeus and SpyEye comprise most of the banking 
threats. It can hide itself from antivirus detection and take 



important banking data to compromise your bank account. 
Furthermore, if the Trojan is really powerful, it can take over 
your entire security system as well. Consequently, a Trojan can 
cause numerous types of damage to your own computer and to 
your online account [24]. 

2) Virus: 

10 years back, Virus was widespread. It is a malicious 
program where it duplicates itself and aims only to destroy a 
computer. The ultimate goal of a virus is to guarantee that the 
victim’s computer will never be able to function properly or 
even at all. Today, It is not common because Malware today is 
designed to earn money over destruction. Accordingly, Virus is 
only used by people who want to take revenge on their 
adversaries [24]. 

3) Worms: 

It is program designed only to spread and thus it is one of 
the most harmless threats. It does not change your system, but 
it can spread from one computer to another within a network or 
even through the internet. The computer security risk here is 
that it will consume your computer hard disk space due to the 
duplication and will take up most of your bandwidth as a result 
of its spreading [24]. 

4) Spyware: 

This Malware is intended to spy on the victim’s computer. 
When infected with it, the daily activity or a certain activityby 
the computer user will be spied by the spyware which will find 
a way to contact the host of this malware. Typically, this 
spyware is used to know what the daily activity of the 
computer user is so that the attacker can make use of his data. 
For instance, if you browse on sex toys every day for a week, 
the attacker will try to come out with a sex toy scam to rip off 
your money [24]. 

5 ) Scareware: 

Scareware is something that you plant into your system 
and instantly it will notify you that you have hundreds of 
infections which you don’t have. It tries to trick you into 
purchasing a false anti-malware that claims to eliminate those 
threats. It is all about ripping off your money through scaring 
you to buy the malware [24] . 

6) Key logger : 

It keeps a record of every keystroke you made on your 
keyboard. . It is a sub-function of a powerful Trojan that 
threatens to steal people’s login credential such as username 
and password [24]. 
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7) Adware: 

Adware is a form of threat where your computer will start 
popping out a lot of advertisement. These advertisements can 
be anything from non-adult materials to adult materials as any 
type of ads will make the host some money. It is not really a 
harmful threat but it can be really annoying [24]. 

8) Backdoor: 

Backdoor is not really a Malware but rather a method 
where once a system is exposed to it, the attacker will be able 
to bypass all the regular verification service. It is usually 
installed before any virus or Trojan infection because having a 
backdoor installed will facilitate the transfer of those threats 
[24]. 

9) Wabbits: 

Is another self-replicating threat but it does not work like a 
Virus or Worms. It does not harm your system like a Virus and 
it does not replicate via your LAN network like Worms. An 
example of Wabbit’s attack is the fork bomb, a form of DDoS 
attack [24]. 

10) Exploit : 

Exploit is a software specifically designed to attack a 
certain vulnerability. For instance, if your web browser is 
vulnerable to some outdated vulnerable flash plugin, Exploit 
will work only on your web browser and plugin. The way to 
avoid Exploit is to always patch your stuff because the function 
of software patches is to fix vulnerabilities [24]. 

11) Botnet: 

Botnet is installed by a BotMaster to take hold of all the 
computer bots via the Botnet infection. It mostly infects 
through drive-by downloads or even Trojan infection. The 
result of this threat is the victim’s computer, which is the bot 
will be used for a large scale attack like DDoS [24]. 

12) Dialer: 

This threat is no longer common today, but looking at the 
technology 10 years back or more when we used to access the 
internet by means of a dial-up modem, it was quite a popular 
threat back then. What it does is that it makes use of your 
internet modem to dial international numbers which are pretty 
expensive. Today, this type of threat is more popular on 
Android as it can make use of the phone call to send SMS to 
premium numbers [24]. 

13) Dropper: 



As the name suggests, a Dropper is designed to drop into a 
computer and install something beneficial to the attacker such 
as Malware or Backdoor. There are two types of Dropper: the 
first type immediately drops and gets installed to avoid 
Antivirus detection. The second type only drops a small file 
which will auto trigger a download process to download the 
Malware [24]. 

14) Fake AV: 

Fake Antivirus threat became a very popular threat among 
Mac users about 10 months ago. Because Mac users rarely face 
a virus infection, scaring them with message which tells them 
that their computer is infected with virus is pretty useful as it 
results into them purchasing a bogus antivirus which does 
nothing [24]. 

15) Phishing: 

Creating a false website designed to look exactly like the 
actual website where the users are tricked into entering their 
username and password into the fake login form which is used 
to steal the identity of the victims. Every form sent out from the 
phishing site will not go to the actual server, but to the attacker 
controlled server [24]. 

16) Cookies: 

Cookies are not really a Malware as such. It is used by most 
websites to store data into your computer. It is used because it 
has the ability to store things into your computer and track your 
activities within the site. If you really don’t like the presence of 
cookies, you can choose to disable them for some of the sites 
which you do not know [24]. 

17) Bluesnarfing: 

Bluesnarfing is having an unauthorized access to specific 
mobile phones, laptops, or PDA via Bluetooth connection. By 
having such unauthorized access, personal stuff such as photos, 
calender, contacts and SMS will all be revealed and probably 
even stolen [24] . 

18) Bluejacking: 

Bluejacking also uses the Bluetooth technology, but it is not 
as serious as Bluesnarfing. What it does is that it connects to 
your Bluetooth device and sends a message to another 
Bluetooth device. It does not damage your privacy or device 
system compared to the Bluesnarfing threat [24]. 

19) DdoS: 
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This is one of the most well-known activities done by 
Anonymous attackers where millions of traffic is sent to a 
single server to cause the system to disable a certain security 
feature so that the anonymous attackers can do their data 
stealing. This kind of trick which is to send a lot of traffic to a 
machine is known as Distributed Denial of Service, also known 
as DDoS [24]. 

20) Boot Sector Virus: 

It is a virus that places its own codes into computer DOS 
boot sector. It is also known as the Master Boot Record. It will 
only start if there it is injected during the boot up period where 
the damage is high but difficult to infect. The victim needs to 
do if he realizes there is a boot sector virus is to remove all the 
bootable drive so that this particular virus will not be able to 
boot [24]. 

21) Browser Hijackers: 

A browser hijacker uses the Trojan Malware to take control 
of the victim’s web browsing session. It is extremely dangerous 
especially when the victim is trying to send some money via 
online banking because the hijacker can alter the destination of 
the bank account and even the amount [24]. 

22 ) Chain Letters: 

When I was small, I got tricked with chain letters written by 
my friend. But chain letters do not stop at that era. Even adults, 
like to send chain letter. Chain letters like Facebook account 
delete letters which say if you don’t forward that particular 
message or email to 20 people or more, your account will be 
deleted, are simply not real, yet people really believe them 
[24]. 

23) Virus Document: 

Nowadays, Virus can be spread through document files 
especially PDF document files. In the past, people would only 
advice you simply not to execute an EXE file, but in today’s 
world with today’s technology, document file should also be 
avoided. It is best if you use an online virus scanner to scan any 
suspicious file before opening it [24]. 



25 ) Obfuscated Spam: 

In actual fact, obfuscated Spam is a spam mail. It is 
obfuscated in the way that it does not look like any spamming 
message so that it can trick the potential victim into clicking it 
[24]. Spam mail today looks very genuine and if you are not 
careful, you might just be attracted to what they are offering 
[24]. 

26) Pharming: 

Pharming works more or less like phishing but in a more 
complicated way. There are two types of pharming: one is DNS 
poisoning where your DNS is being compromised and all your 
traffic will be redirected to the attacker’s DNS. The other type 
of pharming is to edit your HOST file where even if you typed 
www.google.com on your web browser, it will still redirect you 
to another site. Both types are equally dangerous [24]. 

27) Crimeware: 

Crimeware is a form of Malware which takes control of 
your computer to make you commit a computer crime on 
behalf of the attacker. Instead of the hacker himself committing 
the crime, it plants a Trojan or whatever the Malware is called 
to order you to commit a crime instead. This will make the 
hacker himself clean from whatever crime that he had done 
[24]. 

28) SQL Injection: 

SQL injection does not damage the end users directly. It 
aims to infect a vulnerable website. What it does is that it gains 
unauthorized access to the database and the attacker can 
recover all the valuable information stored in the database [24]. 

II. IDPS Challenges 

IDPS as any other system got some challenges that require 
additional research. 

A. IDS rules got different format 

One of the challenges in IDS rules’ analysis is the absence 
of the common format. [25]. IDS rule are different from 
firewall rules in the following areas: 



24) Mousetrapping: ]} Firewall ndes often have 

a common format but IDS rules 

don ’t. 

I am not sure if you had encountered a Mousetrappmg 
Malware before where it traps your web browser to a particular .. _ ,, „ T TT ... 

website only. If you try to type another website, it will F,rewaU mles lmmlly have a smaU set °f act '° ns such as 

repeatedly redirect you back. If you try clicking (allow, deny) whereas IDS mles got different responses. 

forward/backward of the navigation button, it will also redirect 

you back. If you try to close your browser and re-open it, it will 3) Rule ordering plays essential role in firewall rules while it 
set the homepage to that website and you can never get out of is not necessarily in IDS rules. 
this threat unless you remove it [24]. 
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B. Current IDPS doesn 't support mobile ad-hoc environment 

Compared with wired networks; traffic monitoring is 
usually done at switches, routers and gateways, the mobile ad- 
hoc environment does not have such traffic concentration 
points where the IDPS can collect audit data for the entire 
network. Therefore, the only available audit trace will be 
limited to communication activities taking place within the 
radio range, and the intrusion detection algorithms must be 
made to work on this partial and localized information [1, 26, 
27]. 

C. The amount of alerts generated by IDS 

After turning on the typical intrusion detection system for 
the first time, the IDS station will get spammed. Intrusion 
detection systems regularly give off over 10,000 alerts a day. 
For sure not those entire alerts map to real intrusions. A 
separation process is needed to get value out of an intrusion 
detection system; separation process “tuning process” is very 
expensive upfront cost. And, even after tuning, there can be a 
significant ongoing cost to look at alerts that you might want to 
review [28-30]. 

D. Network Architecture Issues 

This challenge is for the NIDS solution since it listens to all 
traffic of the network. Switches are an issue for a NIDS (switch 
forwards traffic to the ports that have devices involved in a 
given conversation). Using switches in networks led to many 
advantages such as (eliminate collisions, reduce processing 
power required on terminating devices and they make 
malicious packet sniffing much more difficult). The last 
advantage of switching is a problem for a NIDS [4, 10, 23, 31]. 

E. High Speed Networks 

The network speed increases ahead of the processing power 
available to process every packet off the network. Current high- 
end systems should be able to cope with a marginally loaded 
gigabit network, but will not be able to come close to coping 
with a fully loaded 10G network [31]. 

F. Type of IDS alerts 

Alerts are the reason for having and IDS. Without them you 
simply have a complex sniffer. There are two types of alerts 
that IDS triggers either true alerts or false alerts. IDS generates 
thousands of alerts per day, analyzing these alerts will cost the 
organization time and effort which causing tension to the 
system analysts. Most system analysts consider these alerts as 
false positive alerts while they can be normal noise caused by 
the IDS (login failure on a password authentication server) 
[10]. There are four types of IDS alerts: (false positive alerts, 
false negative alerts, true positive alerts and true negative 
alerts ), keeping in mind that positive = identified and negative 
= rejected, therefore: (True positive = correctly identified, False 
positive = incorrectly identified, True negative = correctly 
rejected, False negative = incorrectly rejected). 

1 ) False positive: 



Is a mystery term that describes a situation where the IDS 
trigger an alert when there is a malicious activity in simple 
words (IDS makes a mistake). When an IPS has a false 
positive, the primary concern is that legitimate traffic will be 
blocked. IPS false positive is a much more serious matter than 
an IDS false positive, if an IPS blocks legitimate traffic a few 
times; blocking legitimate traffic consider as much more 
serious problem than generating a false alert. The difference 
between IDS and IPS false positives also means that some 
types of filters that are appropriate for IDS are not appropriate 
for IPS. IDS filters leads on suspicious activity intended for a 
human to follow, while IPS filters are used for automatic action 
such as blocking traffic or quarantining an endpoint. There are 
many reasons behind the false positive such as: 

a) If the training stage (in IDS anomaly base ) was not 
trained enough, this will make the running stage got so many 
false positive alerts. 

b) If IDS rules are (in IDS signature base) framed and 
adjusted; several of similar false positive alerts will be 
generated by IDS. IDS rules should detect intrusions based on 
pre-defined rules, tuning these rules will led to minimizing the 
amount of alerts which will kill a huge number of false positive 
alerts. This method is effective in IDS (An IDS false positive is 
an alert that did not result in an intrusion.) but it so dangerous 
in the IPS. Since the primary concern is that legitimate traffic 
will be blocked, and blocking such traffic will led to yank it out 
of the network. 

c) There are different research papers was introduced to 
reduce the false positive alerts, these papers stated that there is 
two different ways to study the false positive reduction either by 
studying the false alert reduction at the sensor level or at the 
log alert file ( after intrusion accrued) [13]. 

2 ) False Negative: 

Is a genuine attack that had not been noticed (Missed 
Alerts). False negatives are hard to quantify (it's difficult to 
figure out what you don't know). False negative create two 
problems (there are missed attacks that will not be discovered, 
false negatives give a false sense of security) [30]. There are 
many reasons behind the false negative such as: 

a) Traffic encryption: Encrypted traffic does not trigger 
alerts because the signature patterns don't match. 

b) Faulty signatures: It may be that a signature is written 
incorrectly (in IDS with misuse detection method) and are not 
watching for the correct attack signature. 

c) Any environment relying on anomaly detection 
method should be relying on file changes, the assumption must 
be that at the time of training the network or system was not 
compromised. If this is not true there will be false negatives for 
any already exploited conditions. 

d) Poor change management: Mostly there are no 
separate departments in most organizations to handle network 
administration, and security, which causes a lack in the 
signature updates. 
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e) Snort sensor administration problems: ( Snort is a 
popular IDS) [30] tuning the Snort rule set to control false 
positives, while eliminates an important rule may cause an 
attack or probe to be unnoticed [31 ]. 

c) An overloaded IDS will drop packets potentially 
causing false negatives. 

3 ) True Positive Alerts 

A legitimate attack which the IDS triggers an alert for it, 
there is no much in the web about this type of alerts since it 
should be about how the IDS works [32] . 

4 ) True Negative Alerts 

Is the state of IDS where there is no attack has taken place 
and no alert is triggered. This will happened when all the rules, 
tools and signatures have evaluated the packet/ log and found 
nothing to match. True negative alerts are an important term 
that should be considered when tuning up the IDS, since bad 
tuning will affect the sensitivity of security system, and 
correlating these alerts will make the IDS alerts more 
significance [33]. 

III. Conclusion 

IDPS as any exciting system needs to be improved, this 
article discusses IDS and IPS, the threats that IDS is trying to 
catch, the myths behind these two systems, the challenges that 
IDS faces and the types of alerts that IDS triggers. 

This article helps network security researchers know the 
state of art stage that the IDPS reaches, so they can start from 
that point to build their own research. By the finding of this 
article the researcher came out with: 1- A proof that IDS and 
IPS are not the same system. 2 -The type of threats are defined 
and categorized. 3 -There is a lack of researches to cover IDS 
true positive alerts and true negative alerts , 
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Abstract — For improving the quality of composite services in 
cloud infrastructure this paper proposes an integrated control 
architecture using the NVF technology that provides load 
balancing with estimation of available system resources. The aim 
of our proposed algorithms is to assess existing physical and 
virtual, resources of telecommunication system. The analysis is 
based on the maximum value of integral resources index of each 
physical machine. Maximum values of available virtual and 
physical resources are transferred to Orchestrator, which 
provides the migration of service components to less loaded 
servers if necessary. Performance analysis of the proposed 
approach shows that load balancing by implementing integrated 
management architecture based on NVF technology allows to 
reduce the duration of service requests by 3 times. 

Keywords - cloud infrastructure management; load balancing ; 
NVF ; analysis cloud resources; duration of service requests 

I. Introduction 

Nowadays permanent traffic growth in the World Wide 
Web is observed. This is resulted from the fact that users strive 
to remotely operate with data regardless of location instead of 
store data locally. This concept requires increasing the 
performance of hardware and software of the provider’s 
infrastructure as well as users’ equipment improvement. 
However, increasing of computing capabilities may be 
unreasonable due to the high cost deployment. Therefore, 
providers prefer automated services based on the cloud 



technology instead of straightforward hardware enhancement. 
This approach provides three basic service models: 
infrastructure as a service (IaaS), platform as a service (PaaS) 
and software as a service (SaaS). These models require a good 
scalability and flexibility of existing hardware resources. The 
key technology that provides effective utilization of avaliable 
resources is virtualization [1]. Virtualization enables more 
efficient servers utilization that allows to decrease the amount 
of necessary hardware resources. Virtualization also provides 
live migration, especially when the server is overloaded and a 
copy of operating system with its applications can be moved to 
another less loaded server. 

Load balancing increases the performance of cloud 
computing systems in terms of data flows distribution within 
the set of interacting hosts [2,3]. The aim of such a system is to 
prevent overloading of physical servers and distribute overall 
workload between available physical servers as even as 
possible. 

Typically, the decision to migrate a virtual machine is taken 
based on the current workload of servers: available bandwidth 
of logical and physical channels, available computing resources 
(memory, CPU, etc.) [4]. In this paper, we propose a new 
approach for the evaluation of these parameters that allows to 
increase the effectiveness of load balancing by introducing an 
integrated control architecture based on the NVF (Network 
Function Virtualization) technology.. 
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II. Recent work 

The problem of available resources estimation in data 
centers is of great interest for both industry and academia [5-8]. 
Many researches have been published so far on the problem of 
resources availability at different levels of a cloud system. . 

In [5], Soares et al. have studied the availability of physical 
servers in dynamic deployment of virtual machines.. Their 
approach is to develop Cloud4NFV (Cloud for NFV) platforms 
with VNF (Virtual Network Functions). Proposed platform 
includes the simulation of virtualized network functions as 
infrastructure resources. However, proposed platform and 
designed algorithm does not take into account available 
hardware and software resources of each physical machine. 

Another approach proposed by Ma et al. [6] is to analyze 
the availability of complex applications based on the service- 
oriented architecture. The load balancing model was developed 
using TOPSIS (Technique for Order of Preference by 
Similarity to Ideal Solution) method. Authors claim that 
proposed system can achieve better load balancing in a large- 
scale cloud computing environment with less number of virtual 
machines migrations. However, proposed approach is not 
reliable for scenarios with dynamic variations of data centers 
parameters due to distance vector routing base. Alternative 
approach in [7] provides adaptive dynamic migration of virtual 
machines by implementation ofdistributed load balancing 
algorithm. 

Input queue Qi 



Proposed algorithm is based on a selective tracking of 
migrations between VLANs (Virtual Local Area Networks) to 
simplify the control plane and reduce reconfiguration time of 
the data center. The simple model has been implemented that 
allows to decrease the time of virtual machines migration 
between data centers by converting them into Red Hat Cluster 
service. However, the implementation of proposed algorithm 
requires significant hardware resources and does not estimate 
the status of other physical machines. Therefore, it is necessary 
to develop a new approach to evaluate the performance of 
available resources and to increase the effectiveness of load 
balancing. In this paper, we develop an integrated control 
architecture using the NVF technology that provides load 
balancing with estimation of available system resources. The 
aim of our proposed algorithms is to assess existing physical 
and virtual, resources of telecommunication system. The 
analysis is based on the maximum value of integral resources 
index of each physical machine. 

III. Proposed control plane architecture based on 
NVF FUNCTIONS 

We propose the control plane architecture (Fig. 1) that 
includes: 

• Virtual Infrastructure Manager (VIM), which is 
responsible for NFV resources management; 

Output queue Q2 




Figure 1 . Conceptual model of management system for cloud-service provision. 
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• VNF manager - responsible for managing the lifecycle 

of VNF instances including copying, configuring, 
updating, up/down scaling). VNF manager is 

responsible for the entire life cycle of VNF providing 
the middleware between virtualized fuctions and 
Orchestrator. For example, VNF manager provides 
installing and configuring of software components; 

• Resource discoverer (RD) - responsible for analysis of 

system resources. Its responsibilities include 

assessment of existing physical and virtual, computing 
resources as well as throughput. Based on the available 
bandwidth and availability of service components, the 
Orchestrator makes a decision to provide the service, 
or to migrate service components to a less loaded 
server in order to provide load balancing and optimal 
resources utilization; 

• Migration manager - is responsible for the migration 
of virtual machines and informs the VDE (Virtual 
Distributed Ethernet) switch manager about VM 
(Virtual Machine) new location (number of VLAN, 
port, , etc.); 

• VDE switch manager - supports and manages VDE 
switches allowing combination of virtual machines in 
VLANs to simplify their management. 

• VDE switch is a virtual analogue of conventional 
Ethernet- switch. All virtual devices connected to VDE 
have the opportunity to interact with each other in a 
way like conventional devices interact in the real 
Ethernet network. VDEs also able to perform fast 
network reconfiguration regardless of ports quantity 
and without influence on the input queue . This feature 
is important to minimize end to end delays of the data 
transmission . VDE switch allows to control the data 
flow by adjusting the number of packets between 
electronic buffers of input and output ports. 

• Orchestrator - responsible for orchestration and 
management of NFV resources and provides services 
basen on NFVI (Network Function Virtualization 
Infrastructure). Orchestrator handles following 
functions: 



random variable with exponential distribution. The request of 
each service is putted into the input queue. Orchestrator 
provides instructions to Resource discoverer to check whether 
all components of required services are available, to wit there is 
enough computing capacity of physical and consequently 
virtual machine to process and provide all service components. 
While Resource discoverer checks the availability of free PM 
(Physical Mashine) resources, Network discoverer checks the 
status of physical and logical channels determining the optimal 
transmission for components exchanging of using the least 
loaded channel. If the Resource discoverer or Network 
discoverer reports about the lack of computing power or lack of 
necessary bandwidth, Resource discoverer informs the 
Orchestrator and sends information about available resources to 
Migration Manager, which makes a choice of PM and transfer 
the available component of service (conducts VM migration) in 
collaboration with VDE switch manager. VDE switch manager 
determines which VLAN is overloaded by VM requests and 
which VLAN is underutilized by VM requests. Then, a VDE 
switch performs switching of logic service components and 
provides system reconfiguration in accordance with the 
changes (i.e upgrades routing tables). The whole process of 
migration and operation in the system is conducted by the 
control of Orchestrator and Virtual Infrastructure Manager. If 
number of requests for a particular component sequentially 
increases Virtual Infrastructure Manager runs NFV technology. 
NFV allows to create a software abstraction of any physical 
device and process requests for service provision based on this 
device. 

A. Principle of resources analysis 

As mentioned above, Resource discoverer is responsible for 
analysis of system resources. Its responsibilities include 
assessment of existing physical and virtual computing and 
bandwidthresources. This analysis will be based on the 
maximum value of integral resources index of each physical 
machine. In order to determine this value it is necessary to 
control available hardware resources and their accessibility. 
Accessibility means availability of both computing resources of 
VM and bandwidth resources of communication channels. 
Computing parameters of VM, which hosts M, components, are 
defined as following: 



monitoring and control of service provision by using 
a common interface; 

- processing the system data of information services; 

- making the decision and informs Migration manager 
about necessity of VM migration by using 
information from Resource discoverer and informs 
VDE switch manager to change the location of 
VMs; 



V M: X CPU, 
CPU = ^ i=l - - 

rU pr 

L=i M - 



(i) 



RAM pr = 



M t yRAM : 



( 2 ) 



interacting with VIM and various VNF managers; 

- accessing services and infrastructure of VNF 
module, which contains information about priorities 
of services. 

In our model, we suppose that cloud-system is requested to 
provide a service. The intensity of incoming request is a 



where M, - the number of applications (components) in i - th 
VM; CPUj - CPU (Central Processing Unit) clock speed of 
VM used by the i - th component; RAM \ - RAM of VM, used by 
the i - th component; k - the VMs quantity of one PM in a cloud 
system. 
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We propose a new algorithm to estimate the avaliable 
resources of cloud system (Fig.2). Proposed algorithm provides 
fast requests processing and analyzes degree of utilization for 
each machine by using the information about the processing 
power of each node (in our case -VM). Proposed algorithm is 
described below. 



proportion of channel bandwidth occupation by each 
component relative to the total number of routes through this 
channel. Above mentioned procedure provided as following. 
First, we define the weight coefficient of information flow rate 
for the i th component of j th service relative to the total traffic 
that transmitted by I th channel: 



Let z — 1, Z is a VM, which hosts i = l,K component of 
service j — 1, S , and m = \,M - a PM, which hosts 
Z m = 1 ,Z VMs. We denotesets of routes: 

p={p x ,...,p n ,...p N ], 
n = l,N, P n 

l=\J, 



k _ Ki u 

iJ> ‘ 2X 



(4) 



fit 



Then, the proportion of unoccupied bandwidth for f 
channel is calculates as: 



ifi 



ij\l 



V 0I / 



1-^ 



hi 



C, 



(5) 



where N - number of paths between components i and 
/ + 1 , L - number of edges in the route n , which connect 
VM z with VM z+1 . First, the input requests queue is checked 

for service components. If queue contains f = 1 , F requests 

for a particular component of the J service, the analyzer 
determines which resources are needed to maintain the service. 
Thus, a table of available CPU, RAM and bandwidth resources 
is forming to ensure access to the component within the time 

^ < ^max anc [ t0 determine VM type for each component of the 
composite service. Simultaneously, algorithm checks whether 

•th j th 

the 1 component of J service is available. Each component 
of service uses resources (CPU, RAM) of physical and virtual 
machines on which it is hosted. Resource discoverer checks if 

th 

Z VM has enough hardware resources to provide another 

• th 

instance of the 1 component to the user. If the amount of 
resources is enough, algorithm verifies that necessary channel 
resources are available to establish a reliable connection for 
components transmission to end users. 



C v > R 



CJ 



(3) 



where /?• ■ - information flow rate, which generate / th 

component of j th service. All parameters, which will be 
available at the moment of network status checking by 

Resource discoverer, will be stored in the array A that 
generates service queue. The enough channel bandwidth to 

provide component of a service will be also stored in A . Then, 
Resource discoverer informs the Orchestrator to transferthe 

component to the output queue Q 2 . 

Network discoverer checks the status of the channels and 
determines an optimal path for components transmission and 
exchanging by using the least loaded channel. Because, cloud 
system receives queries of services components, whichare 
transmitted by different routes, we can determine the 



where 



C, 



denotes the bandwitdh of I th channel. 



The value the occupation part of Pr ? ;1/ bandwidth is 
expressed proportionally to k t l and calculated as following: 



PVy'l/ - Kj\l ' (l - ^01/ ) ■ 100% ^ 

We assume that Pr ? - j U meets the priorities of components 

in relation to others, traffic of which is transmitted by I th 
channel. It means that the largest bandwidth will be released 
when current component will be successfully transferred, 
which in turn will make possible to serve all requests with the 
highest efficiency. 

We also assume that the bandwidth of certain route is used 
ineffectively when I V n j has the maximum value. 

Then, VDE switcher manager and Migration Manager provide 
up loading of the less loaded band and is bandwidth 
reallocation between components with higher needs. 

If during the evaluation of free resources we find out that 
there are not enough resources for another instance of the 
component or the next component, i.e: 

CPU available^) > CPU (i, j ) 



RAM available^) > RAM (i, j ) (g) 

C Pn rmn(k 0U \P n )>R ij \P n (9) 

then the replication of z h virtual machine to another physical 
machine PM m is performed. . 

Information about the necessity of component migration is 
sent to Orchestrator and Migration Manager, which moves 
overloaded VM Z in collaboration with VDE switcher manager. 
If migration is not possible, denial of service will happen and 
RD starts again the process of resources analysis. 
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Figure 2. The algorithm to estimate the avaliable resources of cloud system 



Maximum values of free virtual and channel resources will 
be transferred to the Orchestrator, which will provide the 
migration of applications to the less loaded PM if necessary. 

IV. Simulation and performance analysis 

We develop the simulation model of cloud infrastructure to 
analyse the efficiency of proposed approaches. Conventional 
simulation models operate on the principle of discrete events 
that does not reflect peculiarities of processes in the real 
network. In difference to existing models, our model is based 
on the deployment of virtual machines on the physical servers 
and service provision. The UML (Unified Modeling Language) 
diagram of developed model is shown in Fig. 3. 




Figure 3. The block diagram of simulation model based on Unified Modeling 
Language 
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The conceptual model of service provision to the user based 
on proposed management system and balancing method for the 
cloud infrastructure is described in Fig. 4. 
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Parameters of services for simulation 




Figure 4. The conceptual model of service provision based on the cloud 
infrastructure 

Based on the proposed simulation model we develop a 
software tool by using the Qt5.4 programming environment. 
Our tool allows to create an infrastructure with any 
configuration and parameters. In our case, we use following 
parameters: number of physical servers, hardware resources 
allocated to deploy each component, number of services and 
their type. All elements of infrastructure can be configured 
independently, but initially have the same configuration. The 
connection between them is formed by random filling of the 
adjacency matrix. The dialog window for network 
configuration and simulation process control of developed tool 
is shown in Fig. 5. 
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Figure 5. Dialog window for network configuration 

We create a set of services, which are deployed based on 
designed infrastructure. The system infrastructure is 
represented as a matrix: rows define the number of physical 
servers, and columns define virtual machines, which are 
deployed on each server. Service parameters for our simulation 
are presented in Table 1. Designed infrastructure and VMs 
allocation is shown in Fig. 6. 



TABLE I. 



Name of 
composite 
service 


Color 


Requirements for 
computing 
resources 


Addresses of atomic 
services 


1 


Blue 


{59, 59, 59} 


Instance 1 
{1001,2001,3001} 


2 


Black 


{20, 20, 20} 


Instance 2 
{1002, 1003,2002} 


3 


Purple 


{20, 20, 20} 


Instance 3 
{2003, 3002, 3003} 


4 


Azure 


{20, 20, 20} 


Instance 4 
{4001,4002, 4003} 


5 


Green 


{20, 20, 20} 


Instance 5 
{4004, 5001,5002} 


6 


Lilac 


{20, 20, 20} 


Instance 6 
{5003,5004, 6001} 



Flam mana-asmenti- oT ir&q u«t$ 



I nflrastructure monrtaflng 



Flowi management 



i 


1 


i 


4 


5 


6 


PM *1000 (99%) 












PM #2000 (09%) 






vm nm 



3 PM *3000 (m) 

4 PM *4000 (80%) VM*4CK)1 




5 PM *5000(30%) VM *5001 

6 PM *6000(20%) 

7 PM #7000(0%) 

8 PM *6000 (0%) 

9 PM *$DQQ (0%) 

10 PM *10000 (0%) 



Figure 6. The infrastructure of cloud system 

To generate traffic we are using lognormal distribution for 
interval between requests and exponential traffic intensity 
distribution. This combination allows to reflect properties of 
the real traffic in the cloud network. 

All services and traffic generators are active 
simultaneously. The duration of virtual machines existence is 
not limited. Simulation is conducted in three stages. 

In the first stage, an analysis of the network functioning is 
conducted while operate in accordance with the existing 
architecture and principles of cloud network. At this stage the 
resources utilization is analyzed and compared among existing 
physical servers. In addition, end to end delays of packets 
transmissionandnumbers of processed/unprocessed requests are 
calculated. Special attention is given to service with the largest 
number of uprocessed requests. 

In the second stage a monitoring of cloud system 
infrastructure is provided and durations of service requests are 
determined. Using the integrated architecture enables the 
control of available hardware and software resources. 

In the third stage the load balancing algorithm is started for 
services with the largest duration of service requests 
processing. Coordinated operation between load balancing 
algorithms and integrated management infrastructure promises 
to reduce the duration of service requests processing and delay 
of packet transmission. This allows to increase network 
productivity and reliability as well as to decrease the server 
load. 
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We simulate two scenarios: conventional cloud system and 
cloud system with our implemented solutions. Simulation 
results of service requests duration for different types of service 
are shown in Fig. 7. Switching from conventional system to our 
proposed system decreases the duration of service requests and 
enables resources control for each server. 



The duration of service reauests 




Type of service 



a) 




b) 



The duration of service reauests 




Type of service 



The duration of service reauests 




d) 



The duration of service requests 

$ MB - 




h 



d 1M m XA 401 SOB 

Type of service 



e) 

The duration of service requests 




f) 

Figure 7. The duration of service requests for each type of service 



c) 



TABLE II. Parameters of services before and after turning load balansing algorithm 



Name of 
compo- 
site 

service 


Addresses of 
atomic 
services 


The duration 
of service 
requests 


The total 
number of 
requests 
received to be 
processed into 
the system 


Number of 
successfully 
processed 
requests 


The duration 
of service 
requests after 
migration 


The total number 
of requests 
received to be 
processed into the 
system after 
migration 


Number of 
successfully 
processed 
requests after 
migration 


1 


Instance 1 
{1001,2001, 
3001} 


70 mod.sec. 


212 


212 


70 mod.sec 


407 


407 


2 


Instance 2 
{1002, 1003, 
2002} 


150 mod.sec. 


394 


391 


50 mod.sec 


739 


739 


3 


Instance 3 
{2003, 3002, 
3003} 


130 mod.sec. 


367 


365 


130 mod.sec 


711 


706 


4 


Instance 4 
{4001,4002, 
4003} 


80 mod.sec. 


405 


404 


80 mod.sec 


695 


692 


5 


Instance 5 
{4004, 5001, 
5002} 


90 mod.sec. 


374 


372 


90 mod.sec 


732 


728 


6 


Instance 6 
{5003, 5004, 
6001} 


60 mod.sec. 


332 


332 


60 mod.sec 


740 


740 
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Results in Fig. 7 show that majority of requests are assigned 
to the second service with average processing duration of 150 
seconds in modeling time. Note than one second in modeling 
time is equal to the one microsecond in the real system. 
Accordingly, there are not enough hardware and software 
resources to provide additional component of the second 
service, because buffers of virtual machines are overloaded by 
requests. In this case, Orchestrator takes the decision to migrate 
service components of to another physical server. Fig. 8 shows 
the network infrastructure after migration and load balancing. 
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duration of service requests by 3 times. Proposed integrated 
estimation of bandwidth, software and hardware resources 
reduces the time delay of the service provision to end users 
and unloads the most loaded server. Simulation results show 
that migration of service components decrease the average 
duration of requests processing from 150 to 55 microseconds. 
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Abstract - Special operations such as emergency response as well as military missions are usually characterized by 
the limited resources available to handle generally large-scale operations. Precise resource discovery and allocation 
thus becomes an important factor for the success of such operations. This task has been recognized as a challenging 
research issue. This is due to the dynamic nature of the emergency response elements e.g., personnel and equipment. 
One of the important requirements of these operations is achieving the security of the communications involved in 
the resource discovery and allocation tasks. Security ensures the confidentiality, integrity, and availability of the 
communicated information. Therefore, solutions that are intended for selecting best matching service(s) should not 
only rely on the functional properties of the service but also on level of security under which this service is provided. 
In this study, we propose a secure multicast service discovery architecture that is based on a mobile ad hoc network 
(MANET) of operation participants. The main objective is to locate, select, reserve and assign certain resources to 
parties that are in need of these resources. The involved communications are designed to be secure multicast-based, 
utilizing features of the Role-Based Access Control (RBAC) Model. We describe the details of the proposed com- 
munication protocol. We then qualitatively compare our architecture to other alternative MANET-based service 
discovery architectures. The comparison highlights the merits of the proposed architecture. Finally, we conduct and 
present the results of a set of experiments to evaluate key features of our proposed architecture. 

Key words: Service-Oriented Architecture , Secure Service Discovery , MANET \ Special Operations , Access Control, 
Service Selection. 

I. INTRODUCTION 

Mobile ad hoc networks (MANETs) can play a significant role in facilitating the allocation of resources to the 
spots of need while conducting special operations such as search and rescue, forest fire fighting and military 
combat missions. In these operations, participants and equipment can be thought of as resources, each with 
specific attributes and capabilities. For example, in disaster relief operations we may have participants of 
different types e.g., medical personnel, ambulance workers and heavy equipment operators. These participants 
provide services to individuals or parties in need such as trapped or injured persons. Since MANETs can be 
assembled and operated hastily and without the need for any existing network infrastructure, they present a 
good candidate communications solution to use in such situations. 

Different service discovery architectures have been proposed. These approaches rely mainly on the service 
functional attributes (e.g., medical personnel) for service selection. However, non-functional properties, e.g., 
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security levels, of selected services are crucial in contexts such as that of military operations because of the 
mobility of involved participants and the confidentiality of exchanged information. 

Let us consider the following scenario, which deals with an operations field where we have several groups 
engaged in one or more operation. A group could be responsible for a certain geographical area of the opera- 
tions theatre or it could be responsible for a specific task of the mission. Each group has a commander who is 
responsible for overseeing the progress of the task that is assigned to his/her group. This responsibility includes 
locating and assigning resources to operation participants at their request. Considering a specific group, we can 
classify group members into different types of job functions/specialties (roles). 

The scenario can therefore be described as follows: 

1 . A party in need of a service issues a request to the commander of the group. 

2. The commander decides on the type of needed help. 

3. It consequently dispatches a multicast message to all suitable units (service providers). 

4. The providers, which could satisfy the request, respond back to the commander with their service attrib- 
utes. 

5. The commander evaluates the different responses and selects the one that best satisfies the request. 

6. In the event that no suitable provider is available to service the request, the commander communicates with 
commanders of other groups to seek help from their available resources. However, the group-to-group 
communication is beyond the scope of this paper. We only focus on intra- group communications within 
this study. 

7. Once the right service provider is found, it gets requested to move to address the request. 

Based on this scenario, and depending on the requested help, the commander can confidentially find and al- 
locate the requested help on the basis of best available matching attributes to what has been requested as well as 
other additional information such as the provider’s proximity to the requester. Figure 1 gives an illustration of 
the scenario. 




Figure 1: Targeted Scenario of Operation 
This study addresses the following system elements: 
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• A comprehensive service discovery, selection and reservation system architecture that we design for 
use in special operations. This includes the service provider discovery and management system. 

• A role-based secure communication scheme for connecting the network elements of the architecture. 

• A customized multicast protocol. 

• The security structure of system interactions. 

Consequently, we determine the way the selection of the required service providers is done and how the cal- 
culations in this regard are made. We also determine the security scheme by which the information exchanges 
are governed. For service selection, we build an algorithm that evaluates different candidate services using a 
utility function calculated based on a set of weighted non-functional attributes such as service availability, 
proximity and delivery time. 

The rest of this paper is organized as follows. Section II summarizes the related work on service discovery in 
special operations. Section III discusses the security issues in MANET and describes how our approach ad- 
dresses these issues. Section IV illustrates the network infrastructure design including the communication 
infrastructure, protocol base operations, and packet structure. Section V details the service discovery and 
reservation architecture. Section VI describes the experimental scenarios, which we use to evaluate the effec- 
tiveness of our technique using a set of specific metrics. In addition, it provides a comparative study of several 
service discovery approaches based on a set of criteria that we defined. Finally, Section VII concludes the paper 
and suggests future research directions. 

II. Related Work 

Several studies have dealt with the use of service discovery in special and emergency operations. These works 
can be classified into two categories: (1) research architectures and approaches for non-secure service discovery 
and (2) research architectures and approaches for secure service discovery. For non-secure service discovery, 
several techniques have been proposed. These techniques differ in many aspects including the service discovery 
methodology and the underlying architecture. 

The studies in [1], [3], [4], [7] and [8] propose middleware, framework or general infrastructure to handle 
emergency response operations. The studies deal with data flow and management issues but did not address 
security issues within the system. In [2], a system called EmerLoc was proposed for location-aware medical 
applications that use a wireless body area network (WBAN) and an A* algorithm for service routing and 
navigation. Some security considerations have been implemented and were based on the Public Key Infra- 
structure (PKI). In [5], the study introduces an emergency management architecture that relies on an 
agent-based resource discovery in which agents cooperate to monitor and respond to environmental emergency 
situations. In [6], the authors propose a long-distance multimedia wireless mesh network for collaborative 
disaster emergency responses to enable communications under the constraints of limited network bandwidths 
and high network latencies. In [1 1], a multicast-based service discovery protocol for MANET is proposed. The 
protocol is based on the ODMRP multicast algorithm [12]. Any node that needs a service will have to send the 
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request to the multicast group address of the server that provides this service. In [13], the authors link service 
availability to specific areas in which they can be accessed. Service discovery is therefore done on the basis of 
geographical proximities to the available services of each region. In [14], an SLP-based service discovery 
protocol for MANET is proposed. This technique enables the user to discover appropriate service providers and 
also keeps the user informed of the existence and properties of alternative providers. None of the previous 
works addressed security issues in service discovery environments. 

As far as secure service discovery approaches are concerned, many studies have been conducted with emphasis 
on aspects such as the security properties, security implementation and enforcement and the underling security 
architecture. In [15], the authors focus on access control for sensor nodes in Wireless Sensor Networks (WSNs). 
They address the issue of expanding node conciliation in sensor nodes through broadcast and propose a sensor 
node security approach using T-RBAC. They claim that even if a network node is compromised, it can increase 
the whole network availability as it increases a number of available nodes. They have presented a design and 
implementation. Their security analysis and comparison to other approaches indicated the feasibility of their 
approach. In [16], the issues of privacy and security in MANET are addressed. A combination of an access 
control mechanism and a privacy policy is used to ensure that the privacy and security of personal data is 
protected accordingly. In this paper, the authors defined MANET as a temporary network structure, which is set 
up by a group of existing rescue personnel such as policemen, firemen, an army or paramedics during emer- 
gency services. Nodes in the groups are initially configured before the actual network is established. They 
enhanced access control, through the utilization of Group-Based Access Control (GBAC), by incorporating the 
privacy policies together with the access policy. The goal was to achieve privacy with regards to sensitive data 
such as PII or PHI. In [17], the focus is on the security issues in the semi-infrastructured and ad hoc network, 
which is a wireless MANET sub-network connected to a structured backbone network (LAN). These types of 
networks are gaining popularity but their security is considered as a major obstacle in their advances due to their 
uncontrolled medium access, dynamically changing topology and mobility of the hosts. 

The studies in [18] and [19] focus on devising XML-based security-aware architectures and infrastructures. 
This type of architecture is relatively heavy for use in limited bandwidth wireless environments where simple 
handheld devices are usually used for communications. In [20], the authors focus mainly on providing a scal- 
able solution for securing the service discovery mechanisms based on a new concept of Attribute Based En- 
cryption, which is derived from the Identity Based Encryption schemes. In [21], the authors propose a service 
discovery protocol with security features, named a Secure Pervasive Discovery Protocol (SPDP). It is based on 
an anarchy trust model, which provides the location of trusted services, as well as a protection of confidential 
information, secure communications or access control. The authors in [22] present their implementation of a 
system that provides a communication and security infrastructure to enable clients to access and utilize services 
in heterogeneous networks. In [23], the author proposes a trust-aware routing protocol called TARP that focuses 
more on the trusted availability and quality of trust as important factors in securing Ad Hoc networks. In [24], 
the authors propose a new service discovery model called Splendor that emphasizes security and supports 
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privacy. Location awareness is integrated for location-dependent service discovery and is used to lower service 
discovery network infrastructure requirements. In [25], the authors present a deep review and analysis of 
service discovery protocols in multi-hop mobile ad hoc networks including a suitability assessment of these 
protocols based on architecture, mobility and network size. In [26], a trust-based dynamic secure service dis- 
covery model has been proposed to measure the trust value of a service requestor based on SO A. In [27], the 
study presents a cross-layer service discovery solution tailored for use in MANETs. The solution integrates 
Web services Dynamic Discover (WS-Discovery) and an interoperability gateway to enable service discovery 
across network boundaries thus connecting mobile solution systems and deployed tactical systems. In [28], the 
paper proposes a proactive service discovery approach for pervasive environments relying on a formal context 
model called Hyperspace Analogue to Context, which effectively captures the dynamics of the context and the 
relationship between services and context. In [29], the authors propose ubiquitous bindings for Service Com- 
ponent Architecture applications that modularize the discovery elements thus promoting the sharing of the 
common discovery functionalities and simplifying the integration of discovery protocols. 

In [30], the authors describe a model for dynamic monitoring in ad hoc networks, which serves as the trust-base. 
The model requires some trusted nodes to play the role of the authentication server and the policy rules en- 
forcement server when the previous servers leave the network. The paper in [31] proposes a secure access 
architecture that integrates an access control model to ensure that data travel among groups of rescuers is secure 
and the data integrity is preserved. An access control model is used to specify access policies. Cryptographic 
protocols, has functions and digital signatures are used for confidentiality, integrity and authentication of 
participating members. 

Our study offered an architecture that aims at managing emergency response operations via the use of secure 
role-based service discovery over multicast. This enables hierarchical access to information among operation 
participants. The hierarchy is constructed on the basis on the operation participants’ rank. This is done while we 
address the issue of non-existent or hard-to-access network infrastructure via the use of MANET. We also 
ensure the efficient use of the available bandwidth through the use of multicast as the underlying communica- 
tion technique. To the best of our knowledge, we are the first to provide such a comprehensive architecture to 
the secure special operations problem. 

III. Security Threats Discussion 

The nature of MANET networks, which are infrastructureless and consist of different types of mobile devices, 
and the lack of centralized administration, makes this ubiquities environment very vulnerable to security attacks. 
In this section, we discuss some of the expected security threats to MANET networks, in general, and to our 
service discovery scheme in particular. We define the security requirements for system protection. Finally, we 
introduce some techniques to mitigate the risk posed by these threats and attacks. 
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Security threats against MANET exist at different layers of the TCP/IP protocol stack. They range from 
eavesdropping, to replication, and modification of data exchanged among nodes on the MANET network. This 
is due to the vulnerable nature of MANET, which can be described as follows: 

• Dynamic topology: Due to the fact that each node can join and leave the network independently, this 
introduces the possibility for malicious nodes that can harm the network. 

• Lack of clear line of defense: MANET has no clear lines of defense available, which makes it possible 
to attack the network from any direction. 

• Wireless links: Due the wireless interconnection interfaces in MANET, attacks against links can be 
very effective. 

• Ubiquitous resources such as different device types “e.g., laptops, desktops, and mobile phones”, 
which require different connectivity and have different storage capacities and different processing 
capabilities. This encourages more sophisticated attacks. 

Attacks against MANET can take the form of passive and/or active attacks. While passive attacks do not 
disrupt the operation of the network as they only tap on the lines, listen and gather information, active attacks, 
on the other hand, are more disruptive and can render the network dysfunctional. 

Active attacks include, but not limited to, the following: 

• Flooding Attacks, such as the HELLO flood attack where the attacker node floods the network with a 
high quality route with a powerful transmitter. 

• Dropping attacks: which force compromised nodes to drop all packets that are not destined for them, 
which leads to the denial of service. 

• Modification attacks: these attacks (also known as sinkhole attacks) can modify routing information in 
packets, which can disrupt the entire network. 

• Fabrication attacks: in this type of attacks, adversaries can inject fake packets into the network or send 
fake responses to legitimate requests. 

• Network Layer attacks: examples of such attacks are the reply attacks, blackhole attacks, sinkhole 
attacks, and Sybil attacks. 

A. Security requirements of the proposed architecture 

The key architectural requirements in order to secure service discovery in MANET are as follows: 

• Confidentiality: This requirement ensures service discovery data secrecy. 

• Integrity: this requirement protects against any unauthorized alteration of service discovery data. 

• Availability: which makes sure that the system resources are available when needed. 

• Authentication: especially in such an open and dynamic environment. This requirement ensures the 
authenticity of the communicating entities involved in service discovery. 

• Non-repudiation: this resolves any disputes that result from one entity denying doing the action. 
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• Resilience to attacks: this is critical because it ensure continuity of service discovery operations in 
cases when some the key infrastructures are down or compromised. 

B. Risk Mitigation 

There are many approaches and techniques discussed in the literature to address the security attacks on MANET. 
However, in this paper in order to achieve security and privacy, we focus on access control policies to control 
the access to system resources. For this purpose, we utilize the feature of the public key infrastructure “pub- 
lic/private keys and public key certificates” to enforce these access control policies as well as to address the 
issues of confidentiality, integrity, authentication, and non-repudiation. 

The proposed solution therefore consists of the following elements: 

• Access control: to control access to system resources, we utilize the features of the well-known 
Role-Based Access Control (RBAC), which defines all the job functions on the network as roles. It 
then assigns some permissions/access to these job functions to accomplish their task. Finally, it assigns 
users to roles so that they can gain the permissions assigned to their respective roles. 

• In our approach, a global access policy is specified by network owners and can be enforced using a 
designated role in RBAC (usually the group leader) to serve as the global access policy management 
server (GPMS). Furthermore, due to the mobility nature of MANET, an agent node is also designated 
to assume the role of the GPMS in case that role leaves the network. The processes and the functional 
steps of granting access to the multicast group to the data are elaborated further in subsequent sections 
of this paper. 

• PKI and Certificates: we utilize PKI (X.509), which already exists in most operating systems today 
e.g., Windows built-in PKI, for confidentiality, authentication, integrity and digital signature for 
non-repudiations. We also establish public directories, i.e., FDAP; to store all public keys, public key 
certificates and public key revocation lists. 

IV. Architecture Design 

In this section, we introduce the architecture of the proposed system. We also provide the details of each ele- 
ment of this architecture with emphasis on the communication elements that serve the secure service discovery 
goals of the system. 

A. Architecture Overview 

Figure 2, describes the architecture of the proposed secure service discovery based multicast communication 
system. The architecture is composed of three layers, namely, the Communication Fayer, the Access Control 
and Security Fayer, and the Service Application Fayer. Interfaces between layers represent the communications 
between components of the different layers. The scheme illustrates the details of the main components involved 
in the secure multicast service discovery, selection, and reservation processes. However, it does not specify the 
proper sequence of events within the same layer and between layers. For example, security attributes usually 
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come before access control. So, access control is implemented jointly within both the group access protocol and 
the data access protocol. Likewise, service discovery results in the application layer are used by the service 
selection whose result is used as an input to service reservation. Each component at each layer is briefly de- 



scribed next and will be further discussed, along with the sequence of events, in the subsequent sections. 
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Figure 2: Overall Architecture 

• Communication Layer, provides the needed communication protocols used to support communication 
(e.g., service discovery) among involved parties in emergency operations. The used protocol is based on the 
multicast protocol and is comprised of two main elements, namely, the group access protocol, and the data 
access protocol. 

• Access Control and Security Layer, secures the communications provided by first layer by providing the 
necessary security measures which include credential establishment based on public and private keys and 
access control mechanism, which is implemented using public key certificates, that provide authentication 
of who is allowed to join the multicast group and who will subsequently be allowed or denied to view the 
multicast data. It also manages the role-privileges relationship. 

Figure 3 depicts the concept of access control architecture, it is based on a client/server environment, where the 
client is the group member requesting a service and the server is the group leader accepting or rejecting the 
request from the group member. The group leader then acts as a client when it requests the service from the 
server provider on behalf of the requesting member. 

The figure shows that when a group leader “a server” receives a request from a group member “a client”, it 
checks the client’s credentials and uses three access control components to make a decision on whether to allow 
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or deny access to the resource. The resource is made available to the group leader after the group leader re- 
quests and gets granted access to the resource by the service provider. In the latter case, the group leader acts as 
a client of the service provider. 

The authentication module of the access control components verifies the identity of the requestor using his 
public key certificate. The authorization module consults the access control policy, specified by RBAC, to make 
sure that role of the group member (i.e. the client) has the proper permissions to access these resources. 

The cryptographic module ensures the confidentiality of the communicated information. It also provides a 
digital signature capability for ensuring authentication and non-repudiation. 
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Figure 3: Access control Architecture 



• Service Application Layer, relies on the first two layers to provide secure service discovery, selection and 
reservation, as well as inter-domain discovery. 

As seen in Figure 3, each node includes only the elements of the architecture that are needed by its functionality. 
Therefore, the architecture elements that are implemented on the group leader node are generally different from 
those supported on a group member. 

B. Communication Layer Details 

The communications protocol that is used for the different interactions for the service discovery process in- 
cludes provisions for the base operation and security parameters establishment and exchange. The operation of 
the protocol depends on creating one multicast group for all operation participants. Any member who is part of 
the operation can join the multicast (MC) group. The members, however, are classified in a hierarchical manner 
based on their role in the operation and possibly their rank, if applicable. This role (or rank) is usually deter- 
mined and programmed within the node prior to the operation. Depending on the nature of the operation at 
hand, several roles may be assigned to a node, each with its own credentials. That is, the user may choose to 
activate a specific role, depending on the nature of the current task, and provide the right credentials for the 
active role while communicating within the network. Based on the role, a member node may or may not be able 
to: 



53 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 







International Journal of Computer Science and Information Security (IJCSIS), 
Vol. 13, No. 9, September 2015 



1 . Grant other nodes access to the MC group. 

2. Access some of the communication content being exchanged within the group. 

Having one MC group for all members and granting access to information on the basis of participants’ roles has 
many advantages over having a separate MC group for each participant’s role. For instance, having one group 
provides the following merits: 

• This arrangement saves the overhead of members trying to join several groups for which they are al- 
lowed to access the data according to their rank/role. 

• Once we establish one group, which all members can join, there is no need for creating new groups as the 
need arises, and hence saving the delay in doing so. 

• Data sharing is easier and more efficient for members who can access data that belong to different roles 
(e.g., high ranking personnel). This is not the case when using several MC groups, one for each role. 

The core operation of the underlying multicast communication protocol is based on the study of [27]. The 
establishment and maintenance of the multicast group is composed of the following main tasks: 

• Credential establishment 

• Group formation and access 

• Data access process 

The multicast group formation and member access are done on the basis of node roles and the privileges that are 
assumed by these roles. The entire security arrangement is described in detail in the following sub sections. 

1 ) Credential Establishment 

This phase is used to establish the security capabilities that will be associated with all entities involved in the 
multicast communications. It includes authentication, confidentiality and data integrity. The establishment of 
the security capabilities includes the following: 

• Establishing public key infrastructure: a centralized architecture that comprises a certificate authority 
that issues and distributes a public key certificate, which binds the name with the corresponding public 
key, and attribute certificate that can be used to enforce the implementation of RBAC. Also, directory 
service (i.e. LDAP) can be established to host public key certificates, certificate revocation lists (CRLs), 
attribute certificates (ADs), which are made available to everybody to retrieve public keys and other key 
information of others. Finally, this infrastructure requires the existence of certificate validation authority 
and authorization servers that store access control lists to enforce access control. This is usually ac- 
complished using a customized version of the ISO reference monitor. It is worth noting that the proposed 
infrastructure can be housed on the operations command center, which acts as the centralized server. All 
MANET nodes can access this server either directly or via multi-hop communications. Alternatively, a 
high-ranking group member node can be used to house this infrastructure, with the possibility of having 
backup nodes to replace it in case of failure. 
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• Secret “shared” keys are also pre-established between the commander and the members of each role. 
These keys distributed using the public keys in order to protect the data that are exchanged communi- 
cated between the commander and the group members. 

• The public key and the secret key cryptosystems are assumed to be used in the following processes: 

■ Service registration and deregistration: during this process, a mutual authentication between di- 
rectories and providers is ensured. 

■ Service discovery: in this process, confidentiality, using secret keys or public keys, is used to ensure 
that only authorized clients are allowed to discover services. 

■ Service delivery: here integrity, using public keys/digital signatures, is achieved to protect against 
malicious tampering or accidental modifications. 

2) Group formation and access 

As we discussed above, it is assumed that each node has been preprogrammed with its role and rank within the 
operation at hand. The group and data access will therefore be based on these preconfigured roles of the network 
nodes. 

a) Group formation process 

Group formation is handled by an authorized entity, e.g., a commanding officer or the command center. This is 
due to the fact that accepting members into the group will depend on the keys that these members have and are 
authenticated mainly by this authorized entity. 

The group formation is announced with the list of initial members. This list is included in the initial group 
announcement message. Initial member nodes, which are included in this message, must respond to the an- 
nouncing Multicast Group Leader (MCGL) and include their credentials for verification by the MCGL. The 
MCGL can then distribute the proper keys to the different members on the basis of their roles and hierarchy in 
the group. These keys will be used by the right members for both data access and further authorization to other 
group members to join this multicast group. 

b) Group access process 

The group access process deals with any subsequent node that wishes to join the multicast group. In this work 
we propose the well-known Role-based Access Control Model to control who can join the multicast group as 
well as who can have access to the encryption key that can be used to decrypt and subsequently retrieve the 
multicast message. Role-based Access Control (RBAC) has received considerable attention as a promising 
alternative to traditional discretionary and mandatory access control [28]. In RBAC, a role is a semantic con- 
struct forming the basis for the access control policy. Also, in RBAC, permissions are associated with roles. 
Users are made members of appropriate roles based on their responsibilities and qualifications, thus acquiring 
the permissions of these roles. In RBAC, users can easily be reassigned from one role to another. Roles can also 
be granted new permissions as the need arises, and permissions can be revoked as needed. This simplifies the 
security management process significantly [25]. Group access is therefore performed as follows: 
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• Any node that joins the group may be assigned, in addition to its functional role, the privilege to au- 
thorize other nodes that have a lower role to join the group. A node that becomes a member of the group 
is termed as Multicast Group Member (MCGM) 

• Any node that wishes to join the group sends an encrypted Route Request (RREQ) using the address of 
the MCGL. This message is encrypted with the MCGL public key. The node includes its initial role 
(default node role) in the RREQ message. It also includes its authentication credentials for verification 
purposes by authorizing nodes 

• The RREQ propagates according to the original protocol [27]. However, when a relay node that is not a 
member of the multicast group gets the RREQ, it cannot return a Route Reply (RREP) immediately. 
Instead, it has to keep propagating it up the tree towards the MCGL until it either reaches the MCGL 
itself, or a MCGM that both has a role that is superior to that of the requesting node and has been au- 
thorized by the MCGL to grant group memberships. 

• When an authorized node gets the RREQ and verifies the credentials of the requesting node, it sends 
back a RREP authorizing that node to join the Multicast (MC) group. This RREP message includes an 
authorization for this node to grant other nodes access to the MC group, if its role is high enough for this. 
The RREP is encrypted with the public key of the requestor. 

• If the requesting node does not have a suitable role, or its credentials could not be verified, the author- 
izing node sends back a Negative Acknowledge (NACK) message denying it access to the group. 

Figure 4 illustrates the process of group access in the proposed protocol. The figure shows two cases of nodes 
that wanted to join the MC group, node A and node B. Node A sends its RREQ message which travels towards 
the MCGL until it finds another node that has a role that is superior to that of A’s role. This node therefore sends 
node A the RREP allowing it to join the group. On the other hand, when node B sends its RREQ, it does not find 
a node whose role is higher in the hierarchy than its own role. The RREQ of node B therefore travels all the way 
until it reaches the MCGL. When the MCGL finds that a node B is not eligible to join the group, its sends it a 
“NACK” denying it access to the group. 

In the event that a node wishes to leave the group, it sends a Leave Request (LREQ) message to the MCGL, 
which will in turn, ensures that this node gets removed from the list of MC group members so that it does not 
receive any other communications. When the MCGL, however, decides that a node should move to another 
group, e.g., to add extra support to that group, it will have to transfer this node’s credentials to the commander 
of this group so that it can take care of adding it to its group. 

In the following section, we describe how data access is handled. This mainly depends on the underlying 
security arrangement as described in subsequent sections. 



56 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 




International Journal of Computer Science and Information Security (IJCSIS), 
Vol. 13, No. 9, September 2015 




Figure 4: Illustration of group access technique 

3 ) Data access process 

Data access strategy determines which Multicast Group Member (MCGM) can access a specific data item that 
is flowing within the MC group. By data, we mean any information that gets exchanged within the MC group, 
other than requests to join the group. 

The data access capabilities of a certain network node are determined as follows: 

• When a node joins the MC group, it gets an encryption key that is based on its rank and/or role within the 
operation. This key helps the node decrypts the data that is supposed to access only. 

• When some data packets are sent, they are encrypted in such a way that only a group of nodes with 
specific privileges within the MC are able to access these data. 

• Any packet that gets sent has a flag that specifies the role of the node that can access the data of this 
packet. This feature speeds up the processing of the packet. If a node is not authorized to access this data, 
it simply drops the packet. 

A node that tries to maliciously access the data of a packet, regardless of the role flag, will fail, as the key that it 
has for data decryption will not enable it to access these data. 

If the multicast group formation and member access are done on the basis of node roles and the privileges that 
are assumed by these roles, then the discovery scheme will be conducted based on the pre-formed group. 

The packet structure as well as the detail of the above service operations is described in the following 
sub-sections. 

C. Packet Structure 

Figure 5 describes the base packet structure as supported by the proposed protocol. 



1. Requester ► Commander 



IP header 


KP | 


| Payload j 


| MAC 



< > 

Encrypted 



Authenticated 
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2. Commander ► All Group members with the same role 



IP header 


KPx 


M 


| Payload 


"7 




< — 







Encrypted 

< > 

Authenticated 

IP header = source IP + destination IP 

Ks = secret “shared” key used to encrypt the message “payload”. 

KPc= commander’s public key 
KRc= Commander’s private Key 

KPx = other role members public key (x is a variable represents the selected role member) 
KRx = other role members private key (x is a variable represents the selected role member) 
Payload = information been exchanged between requester and commander 
MAC = Message authentication code used for message integrity 



Figure 5: Packet Structure 

All nodes use a standard way to advertise their services. Service description is to be understood by all nodes 
requesting these services. Table 1 shows the main elements of a discovery packet structure. 



Table 1: Discovery packet structure 



Attributes 


Description 


SCAT 


Service Category 


REQ ID 


Request ID 


SRequestor 


ID of requestor 


SProvider 


ID of provider 


SN 


Service Name 


SA 


Service Attributes 


NFA 


Non-Functional Attributes 


R LOC 


Location of service requestor (x,y) 


P LOC 


Location of service provider (x,y) 



The protocol supports the following packet types: 

• Service Request (SREQ): Sent by a requestor to seek a certain service. The main structure of the packet is 
as follows: SREQ (S_Requestor, REQID, S_CAT, SN, RLOC, SA, NFA) 

• Service Reply (SREP): Sent by service providers that are available to help, in response to a service dis- 
covery request. The main structure of this packet is as follows: 

SREP (SProvider, REQ ID, SN, S_CAT, PLOC, SA, NFA) 

• Service Reservation (SRESV): Sent by the commander to the provider to inform it that its services are 
required. The main structure of this packet is as follows: SRESV (S Requestor, REQ ID, SN, NFA) 

• Service Provider Acknowledgement (SACK): Sent by the service provider as a response to an SRESV 
packet informing that it can service the request. The main structure of this packet is as follows: 

SACK (S Provider, REQ ID, SN, P LOC) 

• Service Provider not available (SNACK): Sent by the service provider as a response to an SRESV packet 
to inform that it can no longer service the request. The main structure of this packet is as follows: 

SNACK (S Provider, REQ ID, SN) 
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• Group Access Request (RREQ): Sent by the service provider who wishes to join a group. The main 
structure of this packet is as follows: RREQ {Public Key, SProvider, S CAT, SN, SA, P Loc) 

• Group Access Response (RREP): Sent by the MCGL or the MCGM in response to a Group Access 
Request to grants MC group access to the requester. The main structure of this packet is as follows: 

RREP {Authorization, Secret key of the MC Group) 

• Group Access Denial (NACK): sent by the MCGL or the MCGM in response to a Group Access Request 
to denying it access to the MC group. The main structure of this packet is as follows: NACK {Denial to 
access MC Group) 

V. Service Discovery Scheme 

Network participants include commander nodes, service requestor nodes and service provider nodes. Figure 6 
shows an example of the interactions that take place between the different network nodes. The service discovery 
scheme is conducted by executing the following operations: 

• Service discovery operation: it is based on the node’s role and its authorization to view the communication. 
This includes the service discovery within the same group as well as discovering services that belong to 
other groups. Also, in this case it requires communication between commanders of different groups to 
support service discovery. The security in this case is achieved through the data access control described 
above. If a requestor node is not authorized to view the data it is requesting, the request will be denied. 

• Service selection operation: it is conducted by the commander node who considers some heuristics to select 
the appropriate service provider. In addition to service attributes, it considers the non-functional attributes 
of the service (e.g., service provider proximity). 



• Service reservation operation: this includes service reservation within the same group as well as service 
reservation from other group, which is performed via the two group commanders. 
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Figure 6: Service Discovery and Reservation Scenario 

A. Secure Service Discovery Operation 

When a station needs a certain service, it issues an SREQ message directly to its commander. The commander 
re-issues an SREQ message to its multicast group including the service-required parameters as well as is 
non-functional attributes and it waits for a response. If any of the service providers is available to provide the 
service with the required parameters and the non- functional attributes, it responds back with an SREP message. 
Otherwise responds with a “Service NACK” message. In these types of communications, security, through the 
encryption of messages among service requesters and commanders, plays a big role in keeping the exchanged 
information between the communicating parties confidential. Also, security, through the group and data access 
controls, ensures that only MCGMs can view and respond to the SREQ from the commander. 

B. Service Selection Algorithm 

Service selection operation is carried out after the service discovery operation is completed. A list of potential 
service providers who responded to the service discovery request is retrieved and classified by the commander. 
These available service providers can supply the same service. Therefore, a decision should be made by the 
commander as to which provider to select on the basis on the best match to the requestor’s requirements. One of 
the important criteria for service selection is considering the non- functional attributes of the service under 
consideration. 

1 ) Non-Functional Attribute Description 

There is a variety of non- functional attributes that are used in discovering services in the context of emergency 
operations. Non-functional properties of a service are qualitative attributes that evaluate the quality of service 
that a given service is providing e.g., service provider availability, and service delivery time. In our study, we 
consider only attributes that we see as most important in differentiating among providers who offer the same 
services. The following is a brief description of these properties: 

• Provider Proximity : the provider proximity is determined in relation to the service requestor; it is the GPS 
coordinates of the provider. This information is collected and stored in the commander node and updated 
whenever a provider changes its location. 

• Service Availability: is the percentage of service responsiveness. It is calculated as the total number of 
responses to the commander reservation requests over the total number of reservation requests during a 
given period of time. 

• Service Delivery Time (SDT): it is the time the provider takes to deliver the service to the intended re- 
questor. 

2 ) Non-Functional Attributes Measurement 

Each of the above mentioned non- functional attributes should be measured and maintained continuously as they 
might change over the time. As for the “availability” attribute, the commander periodically measures the 
availability of each service provider. It is measured as the degree of service responsiveness to the commander 
requests which includes both discovery and reservation requests. For the measurement of service proximity 
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attribute the operation is triggered on-demand whenever it is required to select a given service provider by a 
commander. However, for the measurement of service delivery time, the commander records the average time 
each provider spent in delivering a given service and updates it whenever the provider and/or the client receive 
an updated SDT. This may also include the effectiveness of the provider in servicing the client, the time spent in 
offering the services, and the quality of the provided service. Using the stored values of availability, service 
delivery time and proximity of all service providers allows the commander to differentiate among providers 
who offer the same services and select the best match. 

To protect against data tampering (unauthorized modification of data while in transit) or eavesdropping, our 
protocol utilizes digital signatures by the sender to provide data integrity and authentication of origin, and uses 
the shared secret keys between the providers and the commander for canceling the data. 

The following is a description of the selection algorithm and its main features. Consider the utility function F x 
that is associated with a service provider, X: 



F x = YZ=i(w n *^p), ( 1 ) 

where, Q x is an atomic non-functional attribute of a service S x , N is the number of non-functional attributes 
provided by a service S x , W n is the weight assigned for each non-functional attribute in the client request 
(0<W n < 1, W n = 1), [i is the average of the non- functional attribute values of a candidate service and a is 
the standard deviation of the non- functional attribute values of a candidate service. 

The ultimate goal of the commander is to select the service provider, which maximizes the utility function F x . 



C A Service Selection Example 

We now present an illustrative example of service selection based on the utility function of (l),to select the best 
request matching service from the list of service candidates. We use three service categories mainly medical 
personnel, Rescue workers, and Ambulances. Note that the number of service candidates within each category 
can be different. The commander node relies on three non- functional properties, namely, availability, service 
delivery time, and proximity, in selecting the best match service provider. Table 2 presents, for three categories 
of services, the values of the non-functional parameters used to calculate the utility function of each candidate 
service. The resulting utility functions are used to differentiate among services in order to select the service that 
maximizes the value of the utility function of these competing services. 

The non-functional requirements of client A may be expressed by a weighted vector for a service S t as follows,. 

NRF t = a P + a SDT + <p AV (2) 

where P represents the service proximity, SDT represents the Service Delivery Time, A V describes the service 
availability and a, a and cp are the weights assigned by a client X to each quality property per each request. 
These weights are governed by the following relationship, 



a + o + cp = 1 (3) 

The non- functional requirements of the client and the weights (a, a and cp) assigned to each property include: 
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• Proximity < 10, a = 0.4 (weight assigned to proximity property) 

• Service Delivery Time > 4 (it is defined as a scale between 0 and 5), a = 0.2 

• Availability > 85%, cp = 0.4 

The above weights are defined by the service consumer. They are selected based on the importance given to 
each of these properties. For instance a service consumer might give a high weight to the service proximity if 
he/she is in a very critical situation, and give a low weight to the other properties if they are not as important. In 
many situations the consumer’s opinion might be subjective. Therefore some correctives measures to adjust 
these weights could be considered. An appropriate way of deciding about these weights in this situation might 
include, in addition to the consumer rating, applying a fundamental theory for decision making problem that 
could adjust and average both resulted weights. 

The optimal selection of a service among candidate services is the one who maximizes the utility function 
value. In this sample, there are a total of nine Service candidates, each of which has different utility and 
non- functional attribute values. 



Table 2: Available service categories and their non-functional attributes 



Service Categories 


Medical personnel Services 


Rescue Services 


Ambulance Services 


Service Candidate 


New 

Medics 


Gulf 

Medics 


MEDEC 


Life 

Rescue 


Vital 

Rescue 


SOS AMB 


Fast 

AMB 


AMBD 


Insure 

AMB 


Availability (%) 


0.95 


0.99 


0.88 


0.78 


0.97 


0.93 


0.86 


0.96 


0.92 


SDT (Hours) 


4 


4 


5 


3 


5 


5 


4 


5 


5 


Proximity (Mile) 


5 


8 


10 


2 


11 


6 


13 


7 


3 


Utility (F x ) 


0.75 


0.82 


0.79 


0.76 


0.9 


0.89 


0.72 


0.92 


0.93 


Selected Services 


0.82 


0.9 


0.93 



The results we see from Table 2 show how our service selection scheme is able to differentiate between services 
based on the calculated utility function, which relies on the set of non- functional properties and their weights. 
For example, as shown in Table 2, among the medical personnel Services, the Gulf Medics service is selected as 
the best match service that maximizes the utility function (0.82). The same applies to Rescue and Ambulance 
Services where the Vital Rescue service and Insure AMB services were selected as best match services since 
they maximize the value of utility function of these two services to take the values 0.9 and 0.93, respectively. 

D. Service Reservation Operation 

Once the service discovery phase is completed successfully, the commander node assesses all the received 
responses in order to make its selection based on the response that best matches the requestor’s needs as well as 
other selection non- functional parameters. When it makes its selection, it issues an SRESV message directly to 
the best matching service provider. If the provider is still available to provide the service according to the 
required parameters, it responds back with a “service acknowledgement” (SACK) message, which would 
conclude this phase. However, if this provider is no longer available by the time it gets the reservation message, 
it responds with a “service NACK” (SNACK) message. If the commander node gets an SNACK message, it 
tries the next best provider on its list and so on. Finally, if it does not find any of them available anymore, it will 
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have to restart the service discovery process. As in the service discovery and selection, communications be- 
tween the commander and the matching service provider in the services reservation process (SRESV, SACK, 
and SNACK) are all protected by encryption to ensure confidentiality and by digital signature to ensure au- 
thenticity and integrity. 

The sequence diagram shown in Figure 5 describes an example scenario of the main interactions conducted to 
discover and reserve a service provider to fulfill a requestor’s need. The scenario assumes that the MCGM is 
established and the credential infrastructure is in place. It also assumes a situation where at least one suitable 
provider is available to handle the request. 

VI. Evaluation 

To evaluate the performance of the proposed solution, we conduct a series of experiments. We developed a 
simulator specifically for this purpose, and we used the multicast communication scheme that we presented 
before. To ensure the validity of simulation results, we replicated the same experiments 5 times using the same 
setup and configurations for all experiments scenarios. 

A. Environment Setup 

We extended the DisSERV simulator we have developed in [8] and we implemented a role-based multicast 
communication protocol within the DisSERV simulator. For mobile nodes, the random waypoint mobility 
model [26] is used in the experiments. 

The following are the default simulation parameters: 

• Simulation area is 1000 x 1000 square meters. 

• The size of the request packet is 128 bytes. 

• Service provider and requestor device communication range is 250 meters. 

• Network nodes are initially uniformly distributed over the simulation area. 

• We set the number of groups as following: "Surgeon”, "Nurse", "Ambulance", "FireWorker", "Worker", 
"Police", "medical", "rescue", "Supply System- Electricity". 

• Processing time for each service: {50, 100, 100, 100, 100, 135, 50, 50, 100, 130, 150} 

• Each service provider node moves at a speed of value that is randomly selected in the range of 1 to 6 m/s. 
It follows the random waypoint movement scheme. 

• The service requesting nodes are stationary. 

• The commander node is stationary. 

• The simulation experiment time is 2400 seconds. 

B. Experiment Description and Performance Metrics 

To evaluate main features of our solution we have conducted a series of simulations that measure specific 
metrics, as indicated in Table 3. We use the following metrics in our evaluation: 

• Discovery success: This metric measures the ratio between the services that were actually reserved 
(i.e., fulfilled requests) and the total number of unique service requests. 

• Request delivery ratio: This metric measures the ratio between the number of providers who received a 
certain request to the total number of providers who are expected to get it. 

• The success in controlling access to the group: This is done using two measures: 
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o The ratio of the number of denials to the total number of RREQs 

o The ratio of the number of allowed access to the total number of RREQs 

• The success in getting access permission: This is measured by the ration between the total number of 
unique access requests (by all roles) to the total number of access permissions. 

• The delay in getting access permission versus the different ranks (i.e. average delay per requestor 
rank). 



Table 3: Experiment description and used metrics 



Experiment Description 


Used Metric 


Formula 


The experiment tests whether a ser- 
vice-requesting node was able to find a 
service provider that meets its condi- 
tions located and reserved successfully. 


Discovery Success 
(DS). 


Ef=i SDREVt 
HUSREQt 


This experiment measures the ratio of 
the delivered requests to the total 
number of sent requests. 


Request Delivery 
Ratio. 


Eft- iSPR k 

PDR = 

E" i TPM l 

SPR: Number of Service Provider actually got the 
Service discovery Request (SREQ) 

TPM: Total number of Service providers who should 
get the message. 


Fraction of successful Denials and 
Fraction of successful Allowed Access. 


Success in control- 
ling access to the 
group. 


PD = number of denials/ Total number of RREQ 
PAA = number of allowed access/ Total number of 
RREQ 

T.UNACKL 

H N i=i PRE Qi 






Ef-i NACKt 

PAA = Z. 

lt =1 PREQi 


1. We randomly generate nodes with 

different permissions (roles). 

2. We generate group access requests 
(RREQ) using different roles 

3. Over the duration of the experiment, 

we count the number of RREPs vs. the 
generated RREQs (the repeated RREQs 
should not be counted) 

4. The success ratio is RREPs/RREQs, 
where RREQs can be repeated after 
timeouts. The repeated RREQs should 
NOT be considered in the calculations. 


Success in getting 
access permission. 


PA = # of RREP’s / (# of RREQ - repeated RREQ) 

ri _ Ef=i RREPi 

E ? =1 PREQ i -'Zf =1 RRREQ l 
RRREQ: Repeated RREQ. 
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1. We randomly generate nodes (roles) 
with different permissions. 

2. We select a specific role. We gener- 
ate RREQs from the members of this 
role. We count the average time it takes 
for the RREP to be received. 

3. Repeat #1 above for the other roles. 

4. As the role of the requestor gets 
higher, we should expect the average 
time for receiving the RREP to in- 
crease. 

5. If we get different results than what 
we expected in #4, we should investi- 
gate why. 



Time it takes to get 
access permission 
versus the rank of 
the requestor. 



TAP = average time (Time (RREQ) - Time (RREP)) 
per role 

Trreq : Time stamp when the RREQ request was sent. 

Trrep ’• Time stamp when the RREP response was 
received 

r is a role of given service provider 

T r Total time of getting access permission of a role r. 



TAP r = 



X'k ’T’ v-S t 

Li=l 1 RREP i ~ Li= 1 l RREQi 



T r 



C. Results and Discussion 
1 ) Service Discovery/Reservation success rate 

In this experiment, we evaluate the ability of the proposed technique to locate and reserve required service 
providers (i.e. finding providers). We set the number of available providers within each multicast group to be 
less than the generated requests. We gradually increase the number of generated requests to different groups and 
we verify if the discovery success is maintained. Figure 7 shows that the requestors are able, through the help of 
the commander node, to discover service providers with a very high discovery success percentage that reaches 
92%. Also, an average of around 83% of discovered requests were reserved. The remaining requests, about 
17%, fail to have a provider reserved. Obviously, not all discovered services would be reserved since some 
providers will become busy between the time they were discovered and the time they were sent a reservation 
request. Also, some reservation requests might fail to reach the providers since these providers might be moving 
frequently. According to Figure 8, as the number of requests increases, the gap between discovered services and 
reserved services increase gradually. This can be explained as follow. As more requests are generated, the 
processing and reservation process takes a longer time. During this period some of the providers may become 
busy due to other demand on their services, therefore some requests might be rejected or delayed until the 
provider becomes available. 
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Figure 7: Service Discovery and Reservation Success vs. the Number of Requests 




Figure 8: Service Discovery and Reservation Success vs. the Number of Hops 
In another experiment we change the discovery search diameter (i.e. search distance in number of hops) and we 
measure the service discovery/reservation success rate. Figure 8, shows that, as the number of hops increases, 
the percentage of successful provider discovery and reservation increases. This is intuitive since as we increase 
the area of search we should be able to find and reserve more resources and it shows that the proposed technique 
performs as expected. 

2 ) Request Delivery Ratio 

In the second experiment, we evaluate the ability of the proposed technique to maintain a good level of packet 
delivery ratio while varying the number of discovery requests. Figure 9 shows that the packet delivery ratio is 
around 0.9 with a number of requests relatively small (less than 100). Then, it decreases slightly and remains 
stable despite the increased number of requests. This proves that our solution maintains very high packet 
delivery ratio and packet losses are very small. These results demonstrate that the discovery packets will usually 
reach their destinations, which will not affect the discovery success of our solution. 
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Figure 9: Request Delivery Ratio over the Number of Requests 
3 ) Controlling Access to Multicast Groups 
In these experiments, we evaluate: 

• The ability of our access control scheme to grant access permissions to the different nodes (roles) that are 
willing to join four categories of groups, and, 

• The average time it takes to get access permissions for members belonging to each group. 

For both experiments we randomly generate a number of RREQs, in the range of 20 to 80 requests. We then 
measure the PA and the TAP ratios as described in Table 3. We run each experiment five times and we use the 
average of the results of all runs. Figure 10, shows the access permission success ratio of four groups, namely, 
the commander, head of unit, medical, and rescuer groups. The graph illustrates that the ratio of successful 
group access permission is high for the RREQ requests that are issued to join the rescuer group, which has the 
lowest role in the hierarchy. This is due to the fact that RREQs for lower roles can be approved by many more 
members of the team than RREQs for higher roles. 




Figure 10: Success ratio in controlling access to each group 



As for the average time that it took to process the requests to join different groups, the experiments showed that 
the average time increases, as the level of the group gets higher. The reason is that when, for example, the 
lowest level group access is required, and all members of groups above this group can give access to it. The 
number of members who can give access to a group decreases as the level of the group increases. This is 
reflected directly on the time it takes for granting access. Figure 1 1 illustrates the trend followed in this case. It 
is clear from the figure that the time increases as the group level increases as we discussed. 
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Figure 11: Response time in granting access (Join) the each group 



Figure 12, illustrates the response time measured as the difference between sending a service request (SREQ) 
and receiving a service response (SREP). We average this measure for each number of generated requests and 



number of nodes in the network. The graph shows clearly that the response time values are considerably low 
and relatively stable even with the increased number of requests/nodes. They increase slightly and then stabilize 
around a constant value. This proves that our service discovery scheme maintains a prompt service delivery 



time while scaling with the number of requests and number of nodes deployed on the network. 



8 

1 



2 




[ 100 , 20 ] [ 200 , 40 ] [ 300 , 60 ] [ 400 , 30 ] 



[Number of Nodes, Numberof Requests] 



Figure 12: Average response time versus number of requests and number of nodes in the network 



D. Qualitative Comparison of different Service Discovery Architectures 

We study in this section few examples of existing architectures [2], [3], [8] that addressed the issue of service 
discovery and selection in specialized mobile networks. This comparison is based on a set of properties that are 
relevant to service discovery in special military operations e.g., security, reliability, scalability, mobility, and 
discovery success. The table below details the results of this comparison. 
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Table 3. Qualitative comparison of service discovery architectures based MANET 



Service dis- 

covery solu- 
tions 


Reliability (in 
terms of discov- 
ery success) 


Security features 

(confidentiality, 

authentication) 


Scalability (in 
terms of number 
of requests) 


Communica- 
tion protocol 
used 


Mobility 


Suitability for 

Military operations 


Our proposed 
technique 


High Service 

Discovery 

success. 


Security is fully 
implemented at 
different levels 
(e.g., communica- 
tion, discovery, 
etc.) 


Highly scalable 
as it allows 
request/services 
delegation. 


Multicast 
based wire- 
less commu- 
nications. 


Addressed 


Highly suitable; 

designed specifi- 

cally for military 
interventions. 


EmerLOC [2] 


No service 

discovery is 

involved. 


Security is par- 
tially addressed 
by using Public 
Key Infrastructure 
(PKI) and making 
use of LDAP. 


Not considered. 


Wireless, 

GPRS/GSM 

based. 


Addressed 


Not designed 

specifically for 

Military operations. 


CodeBlue [3] 


Data and naming 
discovery. 


Partially ad- 

dressed using a 
decentralized 
security scheme. 


Scalability is 

evaluated by 

number of 

sensors. 


Wireless 

Communica- 

tion. 


Addressed 


Not designed 

specifically for 

military operations 


DisSERV [8] 


High service 

discovery suc- 
cess. 


Not considered. 


Considered. 


TCP/IP based 
protocols. 


Addressed 


Not designed 

specifically for 

Military operations. 



The above comparison shows that our proposed architecture for service discovery fulfills all the qualitative 
requirements for special military operations unlike the other architectures that are not designed specifically for 
this purpose. The completeness of our architecture relies on its adaptability to the context of military operations 
as it ensures the confidentiality and integrity of communicated information as well as the access control and 
management. For example, with regards to security, our architecture maintains it at different levels, as it ensures 
confidentiality and integrity of data in addition to the access control feature. 



VII. Conclusion 

Emergency response and military operations are of critical nature in terms of the time constraints for delivering 
help to parties in need. As such, managing the discovery and allocation of resources in a timely efficient and 
secure manner is of utmost importance when planning and conducting such missions. In this paper, we intro- 
duced a comprehensive secure role-based service discovery system to use in special operations. The system 
ensures that service requests are allowed for operation members with the right permissions only. The system 
uses a MANET multicast communications system thus conserving on the communications overhead while 
overcoming the likely situation of the lack of networking infrastructure. The elements of the system include 
credential establishment, access control, data and group access, service discovery and service selection and 
reservation components. 

The results of system evaluation showed high discovery and reservation success rate. The results also show that 
the security arrangements, as established within our architecture, ensure secure data delivery and integrity. We 
also qualitatively compared our architecture to other architectures for special operations, which illustrated the 
merits of our proposed system. 
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Abstract: 

This paper presents methods for vehicle segmentation. The camera can be fixed or moving which can 
be used to capture the moving vehicle. During this process, the orientation of the vehicle captured can be in any 
direction. Many segmentation methods available. However, K-Means with Fuzzy logic can be still more 
appropriate in segmenting the vehicles moving on the road. 

1. Introduction 

Vehicle segmentation is an important process in the image processing domain. Only after proper 
segmentation of the vehicle from the road and other sceneries background, the segmented image can be further 
used for template matching, tracking and identifying the type of vehicle. 

Kong, 1998, implement the motion segmentation algorithm based on Galilean wavelets. These wavelets behave 
as matched filters and perform minimum mean- squared error estimations of velocity, orientation, scale and 
spatio-temporal positions. This information is finally used for tracking and segmenting the objects. They 
claimed that the algorithm is robust, it can deal with temporary occlusions and by tuning a threshold it can 
estimate the number of moving objects in the scene. 

Tsai et al, 2007, implement a color transform model to detect vehicles using color and edges. The color 
transform model identifies the pixels from the background and the corner, edge map. The wavelet transforms 
construct an cascade multichannel classifier. This approach eliminates most background pixels in advance to 
make easy to detect the vehicles. 

Chen, 2006, exploit the wavelet decomposition in order to reduce the typical noise problem of image difference 
based approaches. The image difference is computed on the low frequency sub-image of the third level of the 
discrete wavelet transform (DWT). On the extracted blobs they perform morphological operations and extract 
the color and some spatial information. Each blob is associated with a descriptor that is used to track the objects 
through the sequence. 

Ahad Karimi Moridani et al., 2015, present a powerful algorithm of computer vision methods to traffic flow 
monitoring, vehicle detection and traffic analysis, which intend to develop the vehicle count system using an 
image processing technique in CCTV video outputs. This software-based vehicle counter can detect all vehicles 
through images instead of using expensive electronic sensors or cameras embedded in the sidewalks. This 
system processes captured video, detects vehicles in each frame, classifies the vehicles into four types and 
counts all of them, all by image/video processing techniques. 
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2. Problem statement 

Many image processing techniques still find difficulties in segmentation of images. The segmentation is 
becomes clear only when the image is taken in a controlled condition. However, in case of vehicle tracking, 
there is no controlled condition of lighting, and hence a clear image is always a difficult job. It is difficult task to 
segment such image clearly. 

3. System setup 

a. Acquire image 

b. Create contrast in the image 

c. Segment the image using K-Means with Fuzzy Logic 

d. Compare with the template and identify the type of vehicle. 

4. Description and Results 

a) Acquiring image: Image has to be acquired with good quality camera. The various information in the image 
has to be separated from the background of the image. In order to achieve this, there should be a clear contrast 
among objects against the background in the image. Mostly a clear contrast does not present in the image. 
Hence, contrast has to be created in the image. 



b) Contrast: A contrast in the image can be changed by first obtaining the histogram of the image horizontally 
and vertically. 
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In the table shown the first row has an image with good contrast where in the vehicle object is clearly visible 
against the background. The histogram image is properly bell shaped and the major portion of the histogram is 
cantered. However for the image shown in the second row, the image is not having good contrast. Hence the 
corresponding histogram has three locations where the intensities are concentrated. This has to be contrast 
enhanced so that, the objects are clearly visible against the background. The image is contrast adjusted and 
presented in the third row. The corresponding change in the histogram is shown. 

C) K-means algorithm 
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Fig.l K-Means algorithm 

K-Means is a statistical method that uses distance concept in clustering group of adjacent pixels with 
very close similar intensity values. In general, the intensity values and their pixel locations are considered for 
grouping. A manual centre initialization procedure is used for segmenting the image. Only when the number of 
group of objects and the object centres are clearly known in advance, then a better segmentation can be 
achieved. 



D) Fuzzy logic with K-Means (FLKM) 

Vehicle images have different brightness and contrast and different clarity. These variations are 
considered to be fuzzy and hence, the pixels representing the objects are considered fuzzy. Hence, fuzzy logic 
with K-Means has been combined to segment the image. 

Fuzzy Logic (FL) is a multi-valued logic that allows intermediate values to be defined between 
conventional evaluations like true/false, yes/no, high/low. Fuzzy systems are an alternative to traditional notions 
of set membership and logic. 

The training and testing fuzzy logic with K-Means is to map the input pattern with target output data. 
For this, the inbuilt function has to prepare membership table and finally a set of number is stored. During 
testing, the membership function is used to test the pattern. 

Figure 2 presents a schematic architecture for training and testing FLK-Means or segmenting a texture 
image. Training FLKM involves inputting feature patterns from CC and target values for segmenting texture 
image. The sugeno type inference system available in the Matlab software generates inference rules using 
subtractive clustering. The final sets of trained values are stored in a file. In the testing process of FLKM, the 
image is input to FLKM module. The final weights are processed with features and an output is obtained. This 
output is compared with a threshold to assign segmentation values to the image. 

Training FLKM 

Step 1: Input the k-Means segmented image. 

Step 2: Create Fuzzy membership function. 



74 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



International Journal of Computer Science and Information Security (IJCSIS), 
Vol. 13, No. 9, September 2015 



Step 3: Create clustering using K-Means algorithm. 

Step 4: Process with target values. 

Step 5: Obtain final weights. 

Testing FLKM for texture segmentation 
Step 1: Input the k-Means segmented image. 

Step 2: Process with Fuzzy membership function. 

Step 3: Find the cluster to which the pattern belongs. 

Step 4: Obtain estimated target values. 

Step 5: Segment the image 

RADII specifies the range of influence of the cluster centre for each input and output dimension, assuming the 
data falls within a unit hyperbox (range [0 1]). Specifying a smaller cluster radius will usually yield more, 
smaller clusters in the data, and hence more rules. When RADII is a scalar it is applied to all input and output 
dimensions. 




Fig.2a Training and Testing FLKM for segmenting the image 
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Fig.2b Training and Testing FLKM for segmenting the image 




input FCKM segmented image 

Fig.3 Uncontrasted image segmented 





input FCKM segmented image 

Fig.4 Contrasted image segmented 

Figure 3 shows the vehicle image ( less contrast) segmented by FCKM. There is no clear visibility of the vehicle 
in the segmented image. Figure 4 presents the well contrasted image. The segmented image shows the presence 
of a vehicle. 
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5. Conclusion 

This paper presents methods of normalizing and segmenting vehicle images with better clarity. K- 
Means algorithm is used for initial segmentation of the vehicle image. Subsequently, the segmented image is 
input to the Fuzzy logic module for getting the final segmented image. From the segmented image, the presence 
of a vehicle is identified using template matching. 
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ABSTRACT- Designing a classifier in a multiple-label 
classification issue in which the number of features used for 
describing each sample is high and the number of samples is 
low, is faced with many problems. The features describing each 
sample can be divided into three categories: relevant, 

irrelevant and redundant. Redundant and irrelevant features 
could seriously influence the accuracy of classification in such 
issues. In spite of works carried out, the issue of feature 
selection in multiple-label classification issues is still 
considered as a challenge. For this reason, the issue of feature 
selection has been discussed in this paper. The aim of feature 
selection problem is finding a subset of features in order to 
modify and improve the accuracy of estimation without loss of 
accuracy that classifier using the selected features performs the 
classification of data. This study was aimed to enhance the 
efficiency of classification through decreasing the error and to 
increase classification speed through selecting a subset of 
effective features in classification of multi-label data as well as 
using mutual information, Genetic Algorithm and Rank SVM 
classification The performance of proposed method has been 
evaluated on three data collections including Emotions, Scene 
and Yeast that are available on Mulan website. 
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I. Introduction 

Numerous studies have been carried out about single 
label. However, every record might correspond with some 
classes in real world that is called multi-labels. In single- 
label classification, each sample is related to a label, but in 
multi-label classification to one or more labels. In other 
words, in multi-label classifications, the class of each 
sample is determined by a vector of labels. In many real 
applications, the data is also multi-label one. Multi-label 
classification is a special supervised learning issue, in which 
any single instance could be possibly associated with several 
classes simultaneously and thus the classes are no longer 
mutually exclusive. Recently, it has been paid more 
attention to than before due to lots of real-world 
applications, e.g., text categorization, scene and video 
annotation, bioinformatics, and music emotion 



categorization [l].The actual data set reduces significantly 
the degree of precision and accuracy due to the large 
number of irrelevant and redundant features. In addition, the 
more the number of labels, the more difficult learning multi - 
label data. Accordingly, the problem with features arises. 
Features are divided into the three categories: related, 
unrelated and redundant [2]. Related features are counted as 
the features influencing the output; there is a correlation 
between these features and class label, determining class 
label, selecting efficient features to be classified. Unrelated 
features do not express any information about the class 
label. In other word, there is no affiliation between these 
features and class label, so they should not be affected in 
classification. Redundant features are the ones which do not 
provide additional useful information to selected features 
available, so they can be removed from the existing feature 
set, because there are a number of other features in the 
subset that can express the same information. In the 
selection feature problem, the aim is to find one subset of 
the features to correct and recover the accuracy and 
precision estimates, without loss of accuracy, which the 
classifier classifies the data using selected features of 
classification. In this article, the attempts were made to 
select a subset of the features of the multi-label data 
classification to improve the performance classification by 
reducing errors and to increase the speed of classification. 

II. Topics studied in this article 

A. Feature Selection 

The problem of feature selection is considered as one of 
the issues that is proposed in machine learning discussions 
and statistical pattern recognition. This is of high 
importance in many applications (such as classification), for 
there are a lot of irrelevant and redundant features which 
contain little information, removing them does not cause a 
problem, but increase the computational load. In addition, 
this makes a lot of useless information to be stored along 
with useful data. 

B. Feature selection algorithms 

In this section we peruse different methods of selecting 
a subset of features. 
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I ) Filter method 

In filter algorithms, the search process is independent of 
any classification algorithm. The goodness of feature 
subsets are evaluated based on a certain criterion like 
distance measure, information measure and consistency 
measure [3]. 



2 ) Wrapper method 

In a wrapper model, the feature selection algorithm 
exists as a wrapper around a classification algorithm and the 
classification algorithm is used as a “black box” by the 
feature selection algorithm. The performance of the 
classification algorithm is used in the evaluation function to 
evaluate the goodness of feature subsets and guide the 
search [3]. 



C. Genetic algorithm 

Artificial intelligence research within the computer 
science field produced GA, a heuristic search tool designed 
to mimic the natural process of evolution. This heuristic, or 
so called meta-heuristic, is commonly used to generate 
useful solutions for optimization and search problems, often 
employing the natural techniques of evolution, such as 
inheritance, mutation, selection and crossover. John Holland 
developed the formal theory of GA in the 1970s, and 
continued improvements to the price and performance value 
have made GA attractive for many problem-solving 
optimization methods [4]. Genetic algorithms are a type of 
optimization algorithm, meaning they are used to find the 
optimal solution(s) to a given computational problem that 
maximizes or minimizes a particular function [5]. GA has 
been shown to perform well in mixed (continuous and 
discrete) combinatorial problems. Although GA easily 
become trapped in local optima, they are computationally 
expensive and a probabilistic one. A GA begins with a set of 
solutions represented by a group of chromosomes called the 
population. A new population can be generated by 
borrowing solutions from the current population or by 
applying genetic operators such as selection, crossover, and 
mutation to current population. The new population must be 
better than the old one. 

The function of genetic operators warrants more 
detailed attention. The selection operator picks two parent 
chromosomes from the population based on their fitness to 
participate in the next operations, crossover and mutation. 
These steps are considered the most important in a GA 
because they have a positive impact on the overall 
performance. First, parents form new offspring (children) 
through crossover probability. Shortly after, the mutation 
operator randomly exchanges alleles, as occurs in nature. To 
work well, GA require the definition of three important 
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aspects: the objective function, the genetic representation 
and its implementation, the genetic operators and their 
implementation. [4] 

D. SVM algorithm 

Let a binary i.i.d. training set of size 1 be 
{(xi, yi), ... , (x i# y0, ... , (x 1# yO) where x^X and y { E 
{+1,-1} denote the ith training instance vector and its 
binary label. 

In the original space, a linear discriminant function is 
defined as f(x) = w t x -I- b .where w and b are the weight 
vector and bias term respectively. The primary problem of 
SVM for the nonlinearly separable case is formulated as 

Miniw r w + C Yi \= i %i, s. t.y i {w T x i + b) > 1 — 

o, 

i = 1 1 (1) 

where C>0 denotes a regularization constant to 
control the tradeoff between classification errors and 
model complexity, and ^ imn > 0 are slack variables to 
indicate misclassification errors. 

E. Rank SVM algorithm 

In classification for multi-label data using Rank SVM 
algorithm [6,7] in the original input space, q linear 
discriminant functions are defined as 

fk(x) = w k T x + b k ,k = 1, ...,q (2) 

Where w k and b k denote the weight vector and bias term 
of the k th class respectively. Therefore, in Rank-SVM, the 
relative relationship between any relevant label and any 
irrelevant one for some training instance xi, is described 
using the following pairwise constraint: 

/mOi) - /nOi) = (W m - W„) r Xj + (b m ~ (3) 

bn) >1 timrv (.Tn, tt) ^ * ^i) 

Where the slack variable ^ imn > 0 is added to force the 
difference to be equal to or more than 1 . 

As shown in Fig. 1 forp = 1, there exist three possible 
cases: 

(a) /mOi) - fn(Xi)>l and % imn = 0, which is a perfect 
situation 

(b) 0 < fm(Xi) - f n (Xi) < 1 and 0 < f 

imn ^ 1 

(C) /ntOi) - /„(*;) < o and ( imn > 1 
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Rank-SVM: p= 




irrelevant label 

(a) (b) 



(c) M*) 



Figure 1. Three possible relative relationships between a pair 
of labels in Rank-SVM (p = 1).[1] 

III. The proposed method 

In this paper, the genetic algorithm was considered to 
solve the problem of feature selection in the field of multi - 
label sample classification, within the framework of the 
wrapper. In this case, prediction of multi-label samples is 
made through an appropriate threshold t(x). Since the 
evaluation of a multi-label algorithm is more complex than 
that of a single-label Algorithm, three evaluation criteria 
were applied in the present study: one -error, average 
precision and ranking loss. Multi-label support vector 
machine was used to classify the data, and establish relative 
relationship between each related label and each unrelated 
label. 

To select the features using a genetic algorithm, the 
following procedure was followed: 

Creating random population with n pop size and evaluating 
them 

Selecting parents and combining them to create a population 
of children as n c 

Selecting members of the population for enforcing mutation 
and creating mutant population as n m 

Integrating the original population, children and mutants, 
and creating new original one. 

In the present study, the probability of selection was 
calculated by the use of = e”^ Ci in which p was selective 
pressure as parameter and Q was the cost of ith selection. 
The three cross over ways (i.e., single-point crossover, two- 
point crossover and uniform crossover) were adapted to be 
used and through roulette wheel selection (RWS), the 
parents were selected from the main population. To do so, it 
is attributed to each of these ways probability which were 
considered here as follows: 

pSinglePoint = 0.1; 
pDoublePoint = 0.2; 

pUniform = 1 — pSinglePoi — pDoublePoint 



To enforce the mutation, an element was randomly 
selected from the input vector, being completed (one was 
changed to zero and zero to one). 

Suppose there is an original population with a number 
of n pop members from which the population of children is 
selected according toO < p c < 1 , n c = p c n pop , the 
population must be paired, and then n c can be rewritten as 
n c _ 2 | Pc n pop | anc j crow( j 0 f mu tants was selected asn m . 

All these members were integrated together so that it 
equaled n pop + n c + n m . It was ordered, and due to 
elitism, n pop was selected as the best choice from the 
beginning of the population. In this case, the new generation 
and the population were organized. 

Let X E R d be a d-dimensional input space with 1 
features and C set of all output classes, find out the subset 
S^X with k features and Q = (1,2, a finite set of 
class labels, where q is the number of class labels. Further, 
assume that each instance xeX can be associated with a set 
of relevant labels L Q 2^ . At the same time, the 
complement of L, i.e., L = Q \ L , is referred to as a set of 
irrelevant labels of x. Given a training data set of size 1 
drawn identically and independently from an unknown 
probability distribution (i.i.d.) on X x 2 Q , i.e., 

{(x 1; Lf ), , (x^ , (x\, Li)} . Given some instance xi, its 

q discriminant function values and predicted set of 
relevant labels from some multi-label classification 
algorithm are denoted by f^(xj),k= 1, ...,qandL? Q 2 Q 
respectively. 

Also any relevant label should be ranked higher than 
any irrelevant one [7]. In this case, the multi-label prediction 
can be fulfilled through a proper threshold t(x), 

f(x) = { k I f k (x) > t(x) , k = 1, , q] (4) 



IV. The evaluation and compare results 

To evaluate the proposed method we use data sets in 
Table (2) which includes three data sets Emotions, Scene 
and Mulan Yeast, which have been published in the 
database [8]. 



TABLE I. Data sets used in our experiments. 



Datas 

et 


Domain 


Instances 
Train Test 


Features 


Classes 


Emotion 


Music 


391 


202 


72 


6 


Scene 


Scene 


1200 


800 


294 


6 


Yeast 


Biology 


1500 


917 


103 


14 
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Since the proposed method is based on the 
classification labels. Therefore, the measures contained in 
the multi-label data classification criteria of an error, a drop 
in ratings and an average accuracy of classification based on 
the methods used order to assess their use [1]. 

The one error: evaluates how many times that the top- 
ranked label is not one of relevant labels: 

One error = {k <2 L t |/ k p (x t ) = (5) 

max fc , eQ /^(Xj)}| e [0,1]} 

The ranking loss: calculates the average fraction of labels 
pairs (a relevant label versus an irrelevant one) that are not 
correctly ordered for the instance: 

Ranking loss = ^"=1 \{(k, k') e (6) 

tti X < fkX x i)\) e [0,1] 

The average precision: compute the average fraction of 
labels ranked above a specific relevant label kALi , which 
actually are in Li, i.e 

Average precision 

_ iy / i r |{ fc ' e k\ dfad y fk 

~ m {k\ lLil £ J L i l( fc ' € Wto) ^ 1/ 

e [0,1] (7) 

It is favorable that a multi-label algorithm should 
achieve a larger value for the average precision, and smaller 
values for the other four measures. 

The process for evaluating the proposed method is that 
each dataset was tested separately in MATLAB in terms of 
each criterion by using the proposed method (FSS Rank 
SVM), and characteristics of each dataset was reduced by 
the use of proposed method. 



TABLE II. Number of features obtained by test proposed 

method (Fss Rank svm) 



\ measure 

Dataset 


Features 


Feature 
selection by 
Ranking Loss 


Feature 
selection by 
One Error 


Feature 
selection by 
Average 
Precision 


Emotion 


72 


34 


30 


45 


Scene 


294 


93 


115 


102 


Yeast 


103 


64 


63 


61 



each dataset, given that the three criteria have been used, 3 
new datasets were produced). In the second process, each of 
the three primary datasets was tested with Rank SVM for 
multi-label classification, and the errors of the three 
evaluation criteria were obtained. Then, each reduced 
dataset was tested again by Rank SVM, and results were 
compared with the results of the primary dataset. The results 
of the tests carried out for each dataset have been recorded 
in a separate table as following. 



TABLE III. Compare the results Rank svm for Yeast dataset and 

new dataset with reduced features using proposed method 



N. measures 

Data set N. 


Ranking 

Loss 


One 

Error 


Average 

Precision 


cost 

Feature 

selection 


Rank Svm by 
103 feature 


0.1703 


0.2366 


0.7508 




64 feature with Feature 
selection by 
Ranking loss 


0.1644 


0.2116 


0.7635 


0.1644 


63 feature with Feature 
selection by 
One error 


0.1712 


0.2137 


0.7578 


0.2137 


61 feature with Feature 
selection by 
Average Precision 


0.1675 


0.2105 


0.7620 


0.7620 



TABLE IV. Compare the results Rank svm for Scene dataset and 

new dataset with reduced features using proposed method 



measures 

Data set 


Ranking 

Loss 


One 

Error 


Average 

Precision 


Cost 

Feature 

selection 


Rank Svm by 294 
feature 


0.3533 


0.6421 


0.5588 




93 feature with Feature 
selection by 
Ranking loss 


0.0820 


0.2485 


0.8549 


0.0820 


115 feature with Feature 
selection by 
One error 


0.0980 


0.2349 


0.8509 


0.2349 


102 feature with Feature 
selection by 
Average Precision 


0.0931 


0.2425 


0.8507 


0.8507 



Then a new dataset was obtained through feature 
selection based on genetic algorithm to each criterion. (For 
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TABLE V. Compare the results Rank svm for Emotions dataset 

and new dataset with reduced features using proposed method 



measures 

Data set 


Rankin 
g Loss 


One 

Error 


Average 

Precision 


cost 

Feature 

selection 


Rank Svm by 72 
feature 


0.393 

7 


0.584 

2 


0.5747 




34 feature with 
Feature selection by 
Ranking loss 


0.257 

3 


0.402 

5 


0.7099 


0.1755 

2 


30 feature with 
Feature selection by 
One error 


0.246 

4 


0.400 

0 


0.7141 


0.2376 

2 


45 feature 
With Feature 
selection by Average 
Precision 


0.475 

8 


0.575 

0 


0.5533 


0.5747 



Comparing the results, it was concluded that in the 
proposed method, since the proposed algorithm reduces the 
number of features, cost classification is reduced in general 
by reducing the number of features. 
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other Meta-heuristic Algorithms in feature selection on 
multi-label data. 
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V. Conclusion 

In a multi-label classification problem, where the 
number of attributes used to describe each sample is a lot 
and the sample size is small, numerous problems are to be 
faced to design a classifier. Features describing each sample 
can be categorized under three heads of related, unrelated 
and redundant features. Redundant and irrelevant features 
may strongly have an effect on the accuracy of classification 
in such problems. In this article the problem of feature 
selection in the multi-label classification problems were 
discussed. In the feature selection problem, it is aimed at 
finding one subset of features to correct and recover 
precision and accuracy of estimates, without reducing the 
accuracy of the classification accuracy which enables the 
classifier to classify data using selected features. In this 
study, using genetic algorithms and Rank SVM 
classification, a subset of the features effective on multi- 
label data classification have been selected, so that the 
efficiency has been increased by reducing error. 
Performance of the proposed method was assessed on three 
datasets Emotions, Scene and Yeast. Comparing the results, 
it was concluded that in the proposed method, since the 
proposed algorithm reduces the number of features, cost 
classification is reduced in general by reducing the number 
of features. It is proposed for future study to make use of 
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Abstract 

This paper presents the method of tracking vehicle in video frames using Locally 
weighted projection regression (LWPR). The coordinates of the segmented vehicle image are 
presented to the LWPR. Based on the coordinates of the previous video frames, the LWPR 
estimates the next position of the vehicle. The LWPR is trained with coordinates of the 
vehicle obtained from few frames. Based on the learned information the next movement of 
the vehicle is estimated without processing next few video frames. 

1. Introduction 

Vehicle tracking is an important area that is very much useful for tracking vehicles 
from helicopters, tracking the movement of objects from remote sensing, tracking the 
movement of vehicles on the road in a crowded traffic environment. To achieve, the tracking 
pre-processing of images have to be carried out. This includes identifying the presence of 
vehicles in a frame, identifying frames that do not have vehicles images, identifying non- 
vehicle images that can be anything from the road to post lamp. Image processing plays an 
important role in processing the frames of the video, and proving with the presence of the 
vehicle in a frame. 

2. Related work 

The object tracking in video processing is an important step to tracking the moving 
objects in visual-based surveillance systems and represents a challenging task for researchers 
[Porikli and Yilmaz, 2012]. To track the physical appearance of moving objects such as the 
vehicles and identify it in dynamic scene, it has to locate the position, estimate the motion of 
these blobs and follow these movements between two of consecutive frames in video scene 
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[Rhee, 2004]. Several vehicle tracking methods have been illustrated by several researchers 
for different issues; it consists of: 

1) . Region-Based Tracking Methods. 

2) . Contour Tracking Methods. 

3) 3D Model-Based Tracking Methods. 

4) . Feature-Based Tracking Methods. 

5) . Color and Pattern-Based Methods. 

Gupte et al., 2002, introduced a model-based automobile recognizing, tracking and 
classification that is efficiently working under most conditions. The model provides position 
and speed knowledge for each vehicle as long as it is visible, also, this model works on series 
of traffic scenes recorded by a stable camera for automobiles monocular images. The 
processing algorithms of this model represent of three levels: raw images, region level, and 
vehicle level. 

Jin-Cyuan, 2010, introduced a traffic criterions assessment such as vehicles 
numbering and classification involving with a suggested traffic observation scheme. The 
scheme demonstrated in its work the feature ratio and density to classify vehicles, also, it 
used the geometric traits to eliminate the false regions and for more accurate segmentation 
process is used the shades elimination algorithm. 

Roller, 1994, stated that contour tracking methods depend on contours of the vehicle 
in tracking vehicle process. Ambardekar, 2008, used real-time traffic supervision approach 
that employs optical movement and uncalibrated camera parameter knowledge to detect a 
vehicle pose in the 3D world. The approach uses two techniques: color contour based 
matching and gradient-based matching, and it showed results when it tested for tracking, 
foreground object detection, vehicle recognition and vehicle speed assessment methods. 

A real-time vehicles tracking and classification technique on the highway is useful. A 
few traffic criterions are extracted by the above technique. Also, the technique supports the 
occlusion detection and tracking that cause from multiple vehicles poses in the crowding 
situation. The method uses the Kalman filter, background [Monnet, 2003] differencing 
methods and morphological operations for extraction and recognition vehicle's contour. 

In the 3D Model-Based Tracking Methods, Yung and Lai, 1998 presented an 
occlusion detection approach based on the generalized deformable model. The occlusion of 
vehicles detection process use a 3D solid cuboid form with up to six vertices, and this cuboid 
used to fit any different types and sizes of vehicle images by changing the vertices for the 
best fit. 
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Hsieh, 2006, used feature-based tracking methods using a linearity feature technique, which 
is a line-based shade method that uses lines groups to remove all undesirable shades. The 
method undertakes well the occlusion resulting from shades. This method represented an 
automatic vehicle tracking and classification traffic observation system. 

In color and pattern-based tracking method Mao-Chi and Shwu-Huey, 2004, analyzed 
colors of image series in traffic supervision. The technique uses the YCrCb color space for 
the construction preliminary background, segmenting foreground, vehicle location, vehicle 
tracking, shade elimination, and background updating algorithms that used the system. 

The limitation of the camera in vehicle tracking is as follows: 

1. The camera can cover only small distance because of factors such as road configuration 
(e.g., elevation changes, curvature, and overhead or under pass structures), congestion level, 
vehicle mix, and inclement weather vertical and lateral viewing angles. 

2. The number of lanes observed. 

3. Stability with respect to wind and vibration, and image quality. 

3. Methodology 

LWPR involves projection regression to find mapping of input data with output data. 
Locally Weighted Projection Regression (LWPR) is an algorithm that achieves nonlinear 
function approximation in high dimensional spaces even in the presence of redundant and 
irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small 
number of univariate regressions in selected directions in input space. This nonparametric 
local learning system 

i) learns rapidly with second order learning methods based on incremental training, 

ii) uses statistically sound stochastic cross validation to learn, 

iii) adjusts its weighting kernels based on local information only, 

iv) has a computational complexity that is linear in the number of inputs, and 

v) can deal with a large number of-possibly redundant & irrelevant-inputs. 
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action = EVALUATE action TRAIN RESULT 

Fig.l Flowchart of Locally Weighted Projection Regression 

Figure 1 indicates general flow chart for implementing the LWPR algorithm. 



4. Results and discussions 
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Fig.4 Error for learning x-y coordinates error 

Figure 4 shows error while training the LWPR for x-y-difference vector. The 
convergence rate comes close to 0.022 in more than 25 iterations. 
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Fig.5 Estimation of vehicle location by LWPR 

Figure 5 presents the actual location of the vehicle in the frame and the estimated location by 
LWPR. In this limited frames, the LWPR performs best in estimation. 



5. Conclusion 

In this work, the segmented vehicle images are considered. The x,y coordinates of the 
frames were used as input data for training the LWPR algorithm. The final weights obtained 
after training the LWPR has been used for estimating the next vehicle position the subsequent 
vide frames. 
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Abstract — Mobile establishment of masts is an exclusive 
competence a control power, basically regarding town planning, 
mobility characterizes what could move or be moved, which can 
change place, this multilevel extensible, dynamic notion intuitive 
the activity however by three different aspects, and as many 
approaches, ‘nomadisme’, ubiquity, the sensitive system in 
context, nevertheless the use of these devices remains immersive, 
these devices requires all the attention, independently from this 
one. 

This approach is often called ‘nomasime 6 , although this term can 
take different significance in other fields to find proximity, In ray 
of influence that remains to be determined, mobility is in fact 
related to features of the increasing data of the computing 
mobile. 

Keywords — mobility, extensible, mobile devices, WPAN, mobile 
failures, wired, Mobile Technology 

I. Introduction 

The mobile establishment carried out in wireless 
networking and the mobile terminals arouses Growing interest 
in computing .Moreover, the human being is characterized by 
his nomadisme, He seems to be the first to have left his original 
land and gradually inhabited the different continents .We do 
not want to return to the history of humanity, but just to 
underline that the human being is by nature nomad. Nowadays, 
this ambition of nomadisme is not only related to a burning 
desire of social advancement but also a new professional 
version. In matter in fact, in his professional entourage the 
human being uses daily different *, thus a new paradigm 
appeared, known by the name of mobile computing. 



Mobile computing offers a flexible mechanism of 
communications between users and an access to the group of 
services normally available in a classical environment through 
a network, independently from physical (geographical) 
localization and the user’s movements. 

II. MOBILE COMPUTING PROBLEMS 

The problems related to the mobile computing, and more 
specifically the once related to wireless networks and to mobile 



terminals. We try to characterize the impact of these problems 
on the applications distributed, the algorithm and the protocols 
from a network and system point of view. We limit ourselves 
to major mobile computing problems by demonstrating that all 
these problems converge toward the problem of disconnection 
which is the object of this thesis. 



Mobile computing is particularly influenced by the 

limitation of the variation of the bandwidth, 

These two characteristics introduce deadlines of data 
transition in the network. These deadlines degrades the 
performance of the applications we have, for example, the time 
constraints such as the multimedia interactive of the 

videoconferencing type, in the mobile environments, the 
variety of the bandwidth can be ,for example, the result of a 
change of the network at the time of the passage of a high- 
speed wireless network toward a network a very low flow, this 
variety can also be a result of the degradation of the signal 
because of obstacles (building, tunnels. . .). 

The applications distributed risk never functioning correctly 
in the presence of disconnection unfortunately the 
disconnections are frequent events in the mobile environment 
and should not be treated like failures and the applications 
distributed must function in the presence of disconnections as 
normally as possible, this requires mechanisms that must bring 
an added value that differentiates the applications and the 
systems for mobiles environment compared to those conceived 
for the telegraphic environment 

Otherwise, the mobile terminals become increasingly 
powerful in terms of resources offered, however these 
resources present performances which are far from reaching the 
performances of the fixes terminals ,in fact for a terminal to be 
mobile it must be light and in a small size, these characteristics 
limit the of storage capacity ,of the treatment and visualization. 

III. The strong points in mobile technology 

Wireless networks: in which at least two terminals can 
communicate without telegraphic connection, The wireless 
networks are based on a connection that uses radio waves 
instead on ordinary cables, the installation of such networks 
does not demand heavy installations of the existing 
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infrastructure as in the case of the telegraphic networks 
(trenching to convey the cables, equipment of buildings in 

wiring , chutes and connectors) which was worth the fast REFERENCES: 

development of this kind of technologies. 



The geographical perimeter defining the extent of wireless net 
wors make it possible to distingu is sevral categories, the 
wireless networks, personal, local, metropolitan extended set 

Wireless personnel network (WPAN): Concerning the wireless 
networks with a weak range (about a few tens of meters). this 
type mainly used to connect peripherals to a computer 
without a telegraphic connections , several technologies are 
used for the WPAN which the principal of is the Bluetooth 
technology , having the advantage of being far from greedy in 
terms of energy use, which make it adaptable for a use with 
small peripherals 

IV. Mobiles terminals 

The first mobile terminals were introduced with a CT 
technology, to replace telegraphic telephones; these terminals 
were in reality the beginning of a technology that does not only 
influence the telephones but also the personal computers ,the 
mobiles terminals are characterized with the resources offered, 
the obstruction, autonomy and the possible extensions, these 
characteristics allow choosing mobile terminals ,a voluminous 
mobile terminal offers more resources, but it consumes more 
energy which induces a weak autonomy, on the other side a 
smaller and lighter mobile terminal offers less resources but 
allows a good autonomy the extensions represent a 
considerable factor to the choice of a terminal mobile, thus, 
several configuration and architectures can contribute to 
improve their liability of the mobile terminals with the aim of 
meeting the requirements with the end-users on the market 



V. Conclusion 
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The concept of mobile computing the use of wireless 
communication technologies, allowed the appearance of new 
systems of communication that offers more advantages 
compared to the classical systems, the new systems do no 
compel the user with a fixed localization, but it allows him a 
free mobility. 

The mobile environments are characterized by the variability 
of the band-width network and the restrictions on the 
resources uses, especially if all the users of the system are 
mobile. 
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Abstract — Cyber Crimes are increasing day by day, ranging 
from confidentiality violation to identity theft and much 
more. The web activity of the suspect, whether carried out on 
computer or smart device, is hence of particular interest to 
the forensics investigator. Browser forensics i.e forensics of 
suspect’s browser history, saved passwords, cache, recent 
tabs opened etc. , therefore supply ample amount of 
information to the forensic experts in case of any illegal 
involvement of the culprit in any activity done on web 
browsers. 

Owing to the growing popularity and widespread use of the 
Google Chrome web browser, this paper will forensically 
analyse the said browser in windows 8 environment, using 
various forensics tools and techniques, with the aim to 
reconstruct the web browsing activities of the suspect. The 
working of Google Chrome in regular mode, private “Google 
Incognito Mode” and portable modes of operation is 
discussed at length in this paper. 

Keywords — Browser forensics, Private web browsing, 
Chrome Incognito, Chrome forensics, Portable browser 
forensics, Chrome artifacts. 

I. Introduction 

Internet has become the need of hour today. According 
to the Internet Live Stats (2015), 40% of the world’s 
population uses internet daily for a couple of tasks 
involving browsing internet for information or 
entertainment, social networking, email, e -commerce, 
gaming, blogging, banking etc. With such large number of 
internet users throughout the world, the number of cyber 
criminals has also come to a rise. Where the good guys 
benefit a lot from the internet, the bad guys also use it to 
carry out cyber-attacks, communicate with their peers, 
search for attack methods, preparing themselves for the 
crime etc. 

It is interesting to note that, from the websites visited, to 
the items downloaded, every web activity of the user gets 
stored on his device. Even a single word searched by the 
user leaves its trace somewhere in his computer and thus 
can be obtained by the forensic analyst if he/she carries out 
forensic analysis of the suspect’s browser. Thus, browser 
forensics supply ample amount of information to the 
forensic experts in case of illegal involvement of the 
culprit in any activity done on web browsers. 
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Forensic Experts should therefore have full grasp on not 
only the forensic analysis of well-known and well 
acknowledged browsers like Internet Explorer, Google 
Chrome, Mozilla Fire fox, Safari, Opera etc. but should 
also have hands on experience of less popular web 
browsers like Erwise, Arena, Cello, Netscape, iCab, 
Cyberdog etc. Not only this, the forensic experts should 
also know how to find artifacts of interest from older 
versions of well-known web browsers; Internet Explorer, 
Chrome and Mozilla Firefox atleast, because he might 
experience a case where the suspected person is using 
older versions of these browsers. 

According to StatCounter Global market share for the 
web browsers (2015), Google Chrome, Mozilla Firefox 
and Microsoft’s Internet Explorer make up 90% of the 
browser usage. 

Owing to the growing popularity of the Google Chrome 
web browser, having 48.71% of the web browser share 
alone, this paper will move around this web browser. The 
forensic analysis of Google Chrome, as carried out on HP 
Pavilion laptop running Windows 8 OS, in normal/regular, 
private (incognito) and portable modes of operation is 
discussed at length in this paper, to help the forensic 
investigators in investigations relating to web browsers. 

Various open-source forensic tools have been used 
throughout the research to provide maximum amount of 
human readable information to the forensic investigator/ 
practitioner, as retrieved from the Google Chrome’s 
default files and folders. The results have been tested with 
different forensic softwares/ tools to ensure the validity 
and accuracy of the obtained forensic results. 

II. Literature Review 

Current research in the field of browser forensics targets 
the stored files of widely used web browsers notably 
Google Chrome, Internet Explorer, Mozilla Firefox, Safari 
and Opera to extract data of interest. Emphasis nowadays 
is also laid on the structural analysis of internet log files 
from a forensic point of view to gather traces of the internet 
habits of the suspect under investigation. 

Where the web browser vendors endeavor to provide 
safe and secure browsing features to its customers, the 
forensic researchers are trying hard to dig out methods to 
combat these anti-forensics attempts on the web browsers 
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and reveal more and more internet activity of the user that 
gets stored on the disk even in the private or portable web 
browsing mode of operation. 

A number of freeware tools exist on the internet for 
carrying out the forensic analysis of the web browser’s 
history, cache, cookies, login data files etc. Most of the 
tools however, target only a single web browser and cannot 
create a real picture of the case if the culprit uses more than 
one web browser on his device. 

Since the scope of the research is confined to the 
forensic analysis of Google Chrome only, we assume that 
the suspect uses only Chrome on his computer, and thus 
the available open source Chrome analysis tools are 
sufficient to analyze the case forensically. 

A. Basics of Google Chrome 

Google Chrome, is the fastest and most used web 
browser in the world today. For Windows 8, Chrome stores 
its files and database in the following default locations in 
C drive. C:\Users\[USERNAME]\AppData\Local\Google\ 
Chrome\UserData\Default. 

The folder contains files of our interest i.e. Bookmarks, 
Cookies, Current Tabs, History, Last tabs, Login Data, 
Preferences, Top Sites and Web Data. These web browsing 
artifacts are stored in SQLite, SNSS (Session Saver) and 
JSON (Java Script Object Notation) formats. The structure 
of the DB file is quite different from that of other renowned 
browsers e.g. Mozilla Firefox. (Russ Taylor, 2014). 

Google Chrome stores the timestamps in Webkit format 
i.e. number of microseconds passed since 00:00:00 UTC 
of Jan 1, 1601. (Junghoon, Seungbong & Sangjin, 2011). 
However, some of the Chrome files have also been 
observed to follow a flavor of the Windows File time, 
which is basically 100 nano-second intervals since January 
1, 1601 UTC, divided by factor 10. For examination, any 
Chrome time decoder e.g. DCode etc. can be used by the 
forensic investigator to convert the timings given in history 
files to the desired format. 

B. Google Chrome ’s Web Browsing Mode 

Chrome web browser works in the following modes: 

• Regular Mode: It is the default mode that is most 
commonly used. It stores entire user’s activity on 
disk. 

• Private Mode: This mode is designed to give user 
privacy while surfing the Internet. It does not keep 
a track of all of the user’s activity. 

• Portable Mode: The mode allows user to install a 
portable web browser on a USB or cloud media, 
and run it on any PC. It provides the user 
portability to keep his browser files, websites’ 
passwords etc. with him all the time. 

III. Chrome Browser Forensics: Preparation 
and Procedure 

Forensic research in this paper is carried out on HP 
Pavilion laptop running Windows 8. Chrome 40 was 
installed on the PC for experimenting with regular and 
private mode of operation. It was made sure that Chrome 
is in use for more than one week, so that abundant amount 



of information is present to carry out its forensic analysis. 
However, for portable mode forensics, Goggle Chrome 
Portable Application was installed in USB and the 
experiment was repeated. 

In general, the investigation methodology depends 
largely on the OS installed on suspected PC, the web 
browser under investigation, the type of evidence etc. One 
way to analyze the browser forensically is to take the 
image of the hard drive, select some user’s search words 
from the history file, and use FTK Live Search option to 
search those keywords in the imaged drive. The data can 
then be authenticated using CRC (cyclic redundancy 
check), SHA-1 (Secure Hash Algorithm) or MD-5 
(Message Digest Algorithm). 

The second approach for browser forensics is to open 
each file present in the Default Chrome folder and analyze 
it separately for internet evidences using various forensic 
tools and techniques. Then, validate all the results with 
alternative open source tools too, if proprietary softwares 
have not been used in the investigation. This subsection 
however attempts to find artifacts using the second 
method. However, methodology 1 has also been used in 
the paper. 

Well for any methodology, it is important for the 
forensic team to know where he can find the data of his 
interest, for reconstructing the culprit’s web browsing 
activity, as shown in Table 1. 



TABLE 1 . WHERE TO FIND CHROME CONTENTS 



Content 


Found in (File/ Folders) 


Websites visited 


History, Cache, Cookies, 
Recovery Folders, Suggested 
Sites 


Visit count 


History 


Visit time 


History, Cookie, Cache, 
Recovery Folders 


Search Words 


Auto Complete, Cache 


Downloads 


Downloads, Cache 


Sites saved 


Bookmarks 



The forensic investigator must be equipped with a good 
collection of various open-source and proprietary browser 
forensics tools before starting the investigation. Table 2 
below enlists the softwares that will be used for forensic 
analysis of Google Chrome in this paper. 



TABLE 2. WEB BROWSER FORENSIC TOOLS 



Forensics Tool 


Contents analyzed 


Phrozen Browser 
Forensics Tool 


Scans browser’s history and 
keywords 


History Viewer 


History, Top sites, Cookies, 
Keyword, Downloads, 


MyLastSearch 


Search queries 


ChromeCookie View 


Cookies 


Chrome Password 
Decryptor 


Decrypts password 


ChromeCache View 


Cache 


Internet Evidence 
Finder 


Internet artifacts from unallocated 
space, default folders, 
pagefile.sys, hiberfil.sys 


Cookie Cutter 


Google Analytics cookies, Search 
terms 


Chrome Analysis 


History, Bookmarks , Cookies, 
Search words, Downloads 
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Chrome Session 
Parser 


Current and last sessions and tabs 


Web Historian 


History 



The forensic investigator while investigating any case 
involving web browser or any other illegal internet 
activity, should proceed with the following steps. 
(Newman, 2007). 

• Precisely define the scope of the investigation, 

• Maintain a detailed log of investigation activities, 

• Secure the computer/ laptop under investigation, 
Document the peripherals attached, hardware and 
software configurations of the system, 

• Connect a software or hardware write blocker to 
the PC, to prevent accidental damaging of any kind 
of potential evidence. Take snapshot or print any 
result that shows internet-abuse, 

• Preserve any data opened on PC, and the time 
stamps, 



IV. Chrome ’ s Forensics in Regular Mode 

This section will discuss the analysis of the artifacts 
stored on disk in the Regular Browsing mode from the 
forensic point of view. Chrome version 40 was installed on 
the laptop running Windows 8. This section covers the 
forensic analysis of Google Chrome by opening the files 
present in the Default Chrome folder i.e. History, Cookies, 
Bookmarks, Top Sites, WebData, Shortcuts etc., 
separately in various forensic tools and analyzing them for 
required internet evidences. 

A. History: 

The History file found in the ../Chrome/Default/History 
folder is basically a database file that contains record of 
user’s all web history. It contains tables for downloads, 
visits, urls, segmentusage, keywordsearchterms, meta, 
presentation, and segments, that provide useful 
information to the forensic experts about the victim’s web 
activity. The forensic investigator can simply use History 
Viewer tool to open the History file present in the Chrome 
Default folder. The software makes search easy for the 
investigator as seen in the Figure 1 below. 



• Run “Phrozen Browser Forensics Tool” to identify 
the web browser that the suspect uses most. This 
tool scans the history of widely known web 
browsers and provides statistics according to the 
browser usage, 

• If the browser is opened, check whether you can 
view Incognito sign or recent tabs in the Chrome 
Menu. Presence of sign and absence of recent tabs 
both indicate that the private mode has been 
enabled. In that case first collect the evidence from 
Chrome’s own settings bar before the session 
expires. Also collect Chrome’s default files and 
folder for further analysis. However, incase the 
browser is running in regular/ normal mode, 
simply collect all the default files and folders and 
analyze them using forensic tools. 

• If browser is closed, simply collect files from the 
default folder. The absence of URL name for even 
some of the results, indicate the usage of private 
web browsing mode by the suspect. Else it is 
evident that the user uses Chrome in normal/ 
regular mode, 

• Check registry entries for the USB devices 
connected to the computer so far and determine 
whether portable mode was enabled or not. If 
Chrome has been used in portable mode, trace the 
residual artifacts in the computer, 

• Also search File slack space, Swap Files, 
Pagefile.sys, Hyberfil.sys, SLogfile, Volume 
shadow copies, Unpartitioned space, Uninitialized 
file area, Unallocated clusters, $mft, etc. for trace 
of Internet activity. 

• Validate all results with alternative open source 
tools, if proprietary softwares have not been used 
in the investigation, 

• Present your findings. 



History Viewer 



URL History | Title 

http: //www. digital-detectiv. . . Digital Detective I T raining 
https: //www. google, com. p. . . mcs. nust. edu. pk • G oogle S earch 
https: //www. google, com. p. . . 

http: //www. nust. edu. pk/l . . . M ilitary College of S ignals (MCS) - 
http: //www. nust. edu. pk/l ... M ilitary College of S ignals (MCS) - 
https: //www. google, com. p. . . 

https: //Ims. nust. edu. pk/lm. . . N U S T LM S : M ilitary College of Sig... 
https://lms.nust.edu.pk/po... NUST LMS Portal: Login to the site 
https: //www. google, com. p. . . G oogle 
https: //www. google, com. p. . . 

http: //www. howtogeek. co. . . H ow to R ecover Accidentally D ele. . . 

https: //www. google, com. p. . . 

http: //productf orums. googl. . . G oogle G roups 

https: //productforums. goo. . . G oogle G roups 

https: //productforums. goo. . . G oogle G roups 

https: //www. google, com. p. . . 

https: //www. google, com. p. . . 

https: //play, google, com/st. . . T extS ecure Private M essenger - A. . . 



| Visit ... 1 Last Visited Time 
1 1/17/2015 7:30:34 PM 

1 1/17/2015 7:31:32 PM 

1 1/17/2015 7:31:40 PM 

1 1/17/2015 7:31:42 PM 

1 1/17/2015 7:31:42 PM 

1 1/17/2015 7:31:49 PM 

3 1/17/2015 7:48:07 PM 

1 1/17/2015 7:32:31 PM 

2 1/17/2015 8:16:47 PM 

1 1/17/2015 7:37:06 PM 

1 1/17/2015 7:37:10 PM 

1 1/17/2015 7:37:10 PM 

1 1/17/2015 7:37:13 PM 

1 1/17/2015 7:37:13 PM 

1 1/17/2015 7:37:13 PM 

1 1/17/2015 8:14:53 PM 

1 1/17/2015 8:15:08 PM 

1 1/17/2015 8:15:10 PM 



Figure 1 . History file opened in History Viewer 



To speed up the investigation, instead of looking for 
evidences in whole history file, forensic investigator can 
use Browser Forensic tool, to make up a list of keywords 
that he needs to search for in the history and start scan. 
Figure. 2 below shows part of keyword list generated for a 
sample search. 



® Google chrome 

Result found in google chrome history files 
anti-fbrensics 
V anti-forensics 



anonymous Document Display | NSCEP | U. . . 

anonymous 

anonymous 

V anonymous 10 Strategies To Fight Anonym. . . 

anonymous 10 Strategies T 0 Fight Anonym . . . 

tor browser Tor Browser Network Servers . . . 

tor browser Tor Browser Network Servers . . . 

V anonymous anonymous - Google Search 



file : ///C : /Users/Narmeen %20Pc/Desktop/hello/anti-fbrensics 
file : II /E: I msis 12/3rd %20sem/fbrensics/asgn/review%202/p[ 
http: //nepis . epa.gov/Exe/ZyNET. exe/30004NKA ,TXT?ZyAct 
http://nepis.epa.gov/Exe/ZyNET.exe/30004NKA.TXT7ZyAct 
https: //www . google . com . pk/url?sa =t&rct=j&q =&esrc =s&so 
http: //www . informa tionweek. com/security /vulnerabilities/ 10 
http://www.darkreading.com/vulnerabilities-and-threats/10- 
https : U www . facebook. com/stscomps/posts/793296 127384; 
http://cybernews2.blogspot.com/2015/01/tor-browser-netw 
https : U www . google . com . pk/search ?q =anon ymous&oq =ano 



Figure 2. Search Results for specific keywords searched 



Since open source tool has been used for the research, 
the forensic investigator can also view user’s web history 
details and download information in Chrome Analysis tool 
to verify his results. The problem for the forensic 
investigator while investigating history files is that Google 
keeps each web history artifact for a period of three months 
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only, after which it automatically gets deleted. The 
artifacts however do not get wiped off from the drive and 
are kept in archived history database. The forensic 
investigator can get the URL and download information 
from there, while the visit time gets expire. Hence it is 
difficult for the investigator to prepare timelines for 
evidences of cases more than three months old. (Craig 
Wilson, 2014) 

B. Search Keywords 

It is important for the forensic examiner to understand 
that the words searched by the user, on the web browser, 
simply get stored in the URL. Thus, if for instance, he 
comes across a URL say, 

http :// www. goo gle .com/ search?hl=en&source= 
hp&q=chrome&aq=f&oq=&aqi=gl 0 then it means that 
the suspect used Google.com host to search for the variable 
q i.e. Chrome in this case. (Junghoon, Seungbong & 
Sangjin, 2011). 

The recent search words of the suspect can be viewed by 
opening the Default Chrome folder in tool like 
MyLastSearch, History Viewer etc. 

C. Cookies 

Cookies are basically the SQL files, that websites create 
to store the users’ browsing information such as his/her site 
preferences, location or personal profile information etc. 
Cookies also help analyze web traffic and are often 
necessary for website’s functionality. The cookies are of 
two types; first party (set by site domain) and third party 
cookies (comes from sources that display items or adds on 
that particular page.) 

Google Chrome stores its cookies in ../Chrome/ User 
Data / Default folder. The users are given 5 options for 
cookies; allow, block, delete, make exception list, or keep 
cookies until the browser is open. Users, who do not wish 
to be tracked, must disable cookies. To view the cookies 
from Google Chrome, go to Chrome Menu > Settings > 
Show advance settings > Privacy section > All cookies and 
site data. This will open up the cookie console. Any cookie 
value when clicked opens up the dialog box showing 
name, content, domain, path, time of creation and 
expiration. For forensic investigator, detail as small as a 
cookie even helps progress the investigation because 
cookies prove that the user accessed that website at some 
time. 

Since there is no fixed structure for cookies, forensic 
investigator may face problem analyzing them. The best 
solution for that is to use any cookie viewing tool like 
Cookie Spy or Chrome Cookie Viewer. Figure 3 below 
shows a snapshot of the Chrome Cookie Viewer when run 
on the target PC. It provides the name of the cookie, host 
name, path, HTTP use, creation date, last access date, 
expiry date and whether security feature is enabled or not. 

Many websites nowadays, however use Google 
Analytics (GA) cookies. GA cookies are more structured 
than the standard HTTP cookies, and hence provides more 
authentic information regarding the user of that website, 
insight on how he found/ accessed that particular site, how 
many pages did he view and whether the user had 
established active session with the site before closing 
Google Chrome or not. (Nelson, 2012) 



* 








ChromeCookiesView: C:\Users\Narmeen Pc\AppD< 


File Edit View Options Help 


1 x a ! % s' s -a 


Host Name 


Path 


Name 


Value Sec... 


H... 


Last Accessed Created On Expires 


f .yandex.ru 


/ 


yandexuid 


No 


No 


1/17/2015 12:32:... 1/17/2015 12:32... 1/14/2025 12:32... 


t .yimg.com 


/ 


ypcdb 


No 


No 


1/18/201510:28:... 1/18/2015 10:28... 1/19/2015 10:28... 


t .zenoviaexchange.co... 


/ 


_zdeuuid 


No 


Yes 


1/17/2015 12:35c... 1/17/2015 12:33... 1/16/2017 12:33... 


% accounts.google.com 


/ 


ACCOU... 


Yes 


Yes 


1/18/2015 10:27:... 1/17/2015 12:47... 1/16/2017 12:47... 


t accounts.google.com 


/ 


LSID 


Yes 


Yes 


1/18/2015 10:27:... 1/17/2015 11:12... 1/16/2017 11:12... 


t accounts.google.com 


/o 


LSOSID 


Yes 


Yes 


1/18/2015 10:27:... 1/17/201511:12... 1/16/201711:12... 


f accounts.google.com 


/ 


GAPS 


Yes 


Yes 


1/18/2015 10:27:... 1/17/201511:43... 1/16/201711:43... 


% answers.yahoo.com 


/ 


ywandp 


No 


No 


1/18/2015 10:28:... 1/18/201510:28... 1/15/2025 10:28... 


t answers.yahoo.com 


/ 


fpc 


No 


No 


1/18/2015 10:28:... 1/18/201510:28... 1/18/2016 10:28... 


G beacon-1, newrelic.c... 


/ 


JSESSION... 


No 


No 


1/18/201510:28:... 1/18/201510:28... 



Figure 3. Cookies opened in Chrome Cookie View 



D. Login Data 

This file in the ../Chrome/Default folder stores the login 
credentials of user for various websites. The file stores the 
URL of website, username, password, actual name, date of 
creation in plain text as record. 

It is to be noted that Google Chrome leaves it upon the 
OS to secure the saved passwords. In most cases, the 
passwords will be stored in plain text. Figure 4 below 
shows the Login Data file present in ../Chrome/ UserData/ 
Default folder opened in a freely available SQLite DB 
browser. 

H DB Browser for SQLite - C:/Users/Narmeen Pc/AppData/Local/Google/Chrome/User 

File Edit View Help 

o New Database ^ Open Database g Write Changes jjfc Revert Changes 

Database Structure^] Browse Data Edit Pragmas Execute SQL 



Table: j logins ▼ (§j| New Record Delete Record 



origin_url 


action_url 


isername_elemen 


username.value 


jassword.elemen 


password.value 


submit 


| Filter 


| Filter 


| Filter 


| Filter 


| Filter | 


| Filter 


| Filter 


1 https://lms.nus... 


https://lms.nus... 


username 


13mmsisnshafqat 


password 


(BLOB) 




2 https://www.pa... 


https://www.pa... 


email 


narmeen91@g... 


password 


(BLOB) 




3 http://192.168.1 ... 


http://1 92.168.1... 


email 


narmeen91@g... 


pass 


(BLOB) 




4 https://myacco... 


https://myacco... 


email 


narmeen91@g... 


password 


(BLOB) 




5 http://www.nus... 






13mmsisnashaf... 




(BLOB) 




6 https://www.fa... 


https://www.fa... 


email 




pass 


(BLOB) 





Figure 4. Login Credentials opened in SQLite Viewer 

However in some cases, the user’s password is 
encrypted with triple DES algorithm, with the seeding 
input as the user’s own login password. The user is 
prompted to type his master password to view the Chrome 
passwords. Figure 5 as seen below, shows part of Chrome 
Password Decryptor, that the forensic analyst may use to 
decrypt the encrypted Chrome password. 
(Securityxploded, n.d). 




Figure 5. Chrome Password Decryptor 
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E. Shortcuts 

The shortcuts are the guess words that might help the 
user while typing his search keyword in the URL bar. They 
will appear as suggestion to him, like fac for 
facebook.com, gm for gmail.com etc. The Shortcuts file 
can be viewed in SQLite DB Browser to reveal the 
shortcuts. 

F. Top Sites 

Top sites, or the sites most visited by the user can be 
viewed by opening the ../UserData/Default/Top Sites file 
in DB Browser for SQLite. It provides URL along with 
data count etc. for the most viewed sites. However, 
“History Viewer” software can also show these details, as 
seen in Figure 6. 



URL 


| Rank 


| title 


http://gmail.com/ 


0 


Gmail 


https: //www. f acebook. com/ 


1 


(1) Facebook 


http: //security didici. cc/ 


2 


News : securitydidici.cc ■ 


http://elance.com/ 


3 


Hire freelancers and find 




https: //chrome, google. com/webstore?hl=en 


5 


Chrome Web Store 





Figure 6. Top Sites file opened in History Viewer 



G. Web Data 

The Web Data Chrome file stores the login credentials 
and auto fill data of the Chrome users. 

• The login credentials are only stored for the 
websites that the user has permitted to store the 
credentials for. 

• The auto fill data is the data with which the user 
has already filled any web form with. Chrome 
stores this data for user’s ease so that it may auto 
suggest the stored input when the user is about to 
fill the form. (Sarah, 2010) 

Figure 7 shows snapshot of the WebData file from 
../Chrome/UserData/Default folder opened in SQLite DB 
Browser. 




Table: , ~| autafill 



name 


value 


valuejower 


| Filter 


| Filter 


| Filter 


1 email 


elishbakhan@gmail.com 


elishbakhan@gmail.com 


2 Email 


narmeen‘91 


narmeen91 



Figure 7. WebData file opened in SQLite DB Viewer 



H. Preferences 

The Preferences file present in the 
..Chrome/UserData/Default folder indicates the user’s 
preferred settings for the Google Chrome web browser. 
The forensic investigator can simple open it in Chrome by 
double clicking on the file in the folder. Figure 8 shows 
part of the data of the Preferences file. 



"browser' 1 : { 

1, clear_lso_data_enabled'' : true, 

" 1 as t_c lea r_b rows in g_d a t a_t ime 11 : "13 0660 641 1077 5054", 

" 1 as t_k nown_go o gJLe_u r 1 " : " htt ps : / / www .google. com . p k/ " , 
,, last_prompted_google_url' 1 : "https: //www. google. com. pk/", 
ir pepper_fla5h_settings_enabled'’ : true, 

"window_p lac em e n t " : { 

Figure 8. Snapshot of Preferences file 

I. Bookmarks 

Bookmarks are URI’s (Universal Resource Identifiers) 
that are basically the shortcuts to the favorite or saved 
pages. If the website has been bookmarked, the user 
doesn’t need to remember the URL for opening it. Thus 
these bookmarks provide the forensic investigator idea of 
what kind of data or website does the user deems 
important. Bookmarked sites can be opened from the 
Chrome Menu > Bookmarks. 

Bookmarks tab only shows the URL of the websites that 
the user has bookmarked. This information is insufficient 
for a forensic analyzer to determine which bookmarks are 
recent. He may therefore open the bookmarks file located 
in Chrome/ UserData/ Default folder in notepad or chrome 
to see other bookmark’s parameters such as time of 
bookmarking, URL, type etc., as seen in Figure 9. 

"checks**" : "1981a695d49f48548974bl24aa9ed«82" , 

"roots": 1 

"book»ark_bar": { 

"children": [ { 

~date_added": “ 1306597866 14 39898", 

"id": "138", 

"na»e": "Browser Forensics ", 

"type": "url", 

"url" : "http: //www. browser -forensics . net/* 

h ( 

"date_added": "13861632869800080", 

“id" :~"50" , 

"na»e": "Security news aggregated", 

"type": "url", 

"url": "http://security.didici.cc/news" 

Figure 9. Chrome Bookmarks opened in Chrome 

To interpret the date added field as seen in figure 13, the 
forensic investigator may use Dcode Time decoder, or any 
other freely available decoder. Figure 10, shows the first 
date_added value converted to Pakistan standard time i.e. 
GMT+05. 




Figure 10. Chrome Time Decoding 
J. Bookmarks. bak 

Bookmarks.bak file contains the recent backup of the 
Chrome bookmarks, since the user last launched it. It 
overwrites this backup every time the user launches the 
Google Chrome web browser. 
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Incase the suspect has deleted the bookmarks before 
running away from the crime scene, the forensic 
investigator can restore them by deleting the bookmarks 
file present in ../Chrome/UserData/Default folder and 
renaming the Bookmarks. bak file to Bookmarks. All 
previously deleted bookmarks will be restored to the 
bookmarks bar and can be viewed upon opening the web 
browser. 

K. Cache 

Cache transparently stores website’s data so that future 
requests for that data can be served faster. The Chrome’s 
Cache folder consists of an index file, four data files and 
another hex file. Figure 1 1 below shows the view of cache 
folder, as seen from the Chrome Cache Viewer tool. 




ChromeCacheView: C:\Users\ 



File Edit View 


Options Help 










la *tilHia 


URL 


Content Type 


FileS... 


Last Accessed 


Server Ti... 


Server Last ... 


http://images.des.. 


image/gif 


918 


1/17/2015 11:11:46 PM 


1/17/201... 


4/20/2013 5:... 


http://static. spice... 


image/gif 


786 


1/17/2015 12:32:37 PM 


1/17/201... 


1/15/2015 6:... 


http://www.fo ren .. . 


image/gif 


185 


1/18/2015 1:24:25 PM 


1/18/201... 


5/4/2010 6:0... 


https://www.elan... 


text/html 


0 


1/17/2015 9:12:31 PM 


1/17/201... 




https://googlead... 


image/gif 


0 


1/17/2015 9:13:13 PM 


1/17/201... 




https://www.elan... 


text/html 


8,407 


1/17/2015 9:12:33 PM 


1/17/201... 




http://s.gr-assets.... 


image/gif 


3,531 


1/18/2015 1:03:15 PM 


1/13/201... 


12/16/2014 ... 


https://s.qr-asset... 


image/gif 


3,531 


1/18/2015 1:08:27 PM 


1/13/201... 


12/16/2014 ... 



Figure 1 1 . Cache file opened in Chrome Cache View 

Clicking on them, open up a dialog box, revealing more 
information regarding the cached object, for instance URL, 
content type, file size, last accessed date and time, server 
time, expire time, server response, eTag, Cache Control 
etc. 



L. Other Internet Artifacts 

Part of the memory of computer’s PC also stores 
artifacts/ message traces of web mailing and social 
networking sites like Facebook, Gmail, Yahoo etc. 
Therefore, the investigator must also look upon the 
pagefile.sys, hyberfil.sys, unallocated space, etc. for their 
artifacts too. 

The Internet Evidence Finder software can be used by 
the forensic investigator to find traces of social networking 
apps, webmail services, mapping queries, instant 
messaging applications, cloud based services etc. from 
different memory locations including the pagefile.sys, 
hiberfil.sys, $mft, unallocated clusters etc. A search 
summary and scan, was conducted as part of research 
using the Internet Evidence Finder tool to find the left over 
internet artifacts. The results of the search summary can be 
seen in the figure 12 below. 



1 Checked Source Located At Email (a) Status Subject Snippet Attachments Date/Time (Local) 

IS 103226\Twitta Status Art>facts\Twitter Status Artifacts Report.dat | Seccor 108486793 CSIRT <C3irt . helpdeskSmcs . edu . pk\ " r 
J1 releases security upaate tor nasn Fiayfer Dated: 14-11-2014 Adobe recently fixed 18 critical ^ 

- Player and None Sat, Nov IS, 2014 at 12:24 AM 141S998019013000 

31 C: (Unallocated clusters) Logical Sector 108486794 PEC CPD Short Course <pec.cpd.eelO@n 

• Short Course> Read Workshop on Optical Fiber Technologies on 27th Nov 2014 Dear All Assalan 

- Signals (MCS) , a Campus of National University of brochure.pdf, guide map to MCS.jpg, poster. i 

- Nov 14, 2014 at 1:32 PM 1415955145943000 

41 C: (Unallocated clusters) Logical Sector 108486795 CSIRT <csirt . helpdeskgmcs . edu . pk\ " r 

- iOs Security Flaw likely to hit 95% iOs devices Dated: 12-11-2014 According to the recent FireEj 



Figure 12. LEF Scan Results 



M. Deleted Chrome Data 

If however the culprit/user deletes the browsing history 
from the Chrome Menu> Clear History, all the history files 
present in the memory gets deleted. However, the 



downloads, cookies, and cache files are initialized to 
zeroes. (Junghoon, Seungbong & Sangjin, 2011). The 
default folder when opened in Access Data FTK Imager 
showed deleted files with a red cross, as seen in the Figure 
13 below. 



I puuKmarns.uaK. 


NO 


regular rue 


i j u-.uy.uu k rvi 


j Cookies 


1.943 KB 


Regular File 


1/25/2015 6:34:00 PM 


. Cookies-journal 


16 KB 


Regular File 


1/25/2015 6:34:00 PM 


CURRENT 


1 KB 


Regular File 


1/25/2015 6:32:59 PM 


Current Session 


134 KB 


Regular File 


1/25/2015 6:34:22 PM 


j 1 Current Session.FileSla... 


187 KB 


File Slack 




, Current Tabs 


1 KB 


Regular File 


1/25/2015 6:34:09 PM 


Favicons 


298 KB 


Regular File 


1/25/2015 6:33:57 PM 


Favicons-iournal 


16 KB 


Reaular File 


1/25/2015 6:33:57 PM 



Figure 13. Deleted File shown in FTK Imager 



In short, we observe that Google Chrome really stores a 
wealth of internet artifacts on the user’s drive. Among all 
the files, the history file alone provides sufficient 
information to the forensic investigator to reconstruct the 
timeline of user’s activity or at the least get the idea of his 
intention. The cached web pages not only provides user 
content of the sites visited by the suspect but also proves 
that the user has visited those particular sites. This 
observation is important in cases of child pornography, 
electronic fraud, non-repudiation cases etc. Moreover, the 
forensic investigator must also analyze the cookie’s file to 
get the session ID of the user, top sites file to look for 
suspect’s most visited sites, login data file to note suspect’s 
credentials for various websites, etc. In terms of privacy, 
the regular mode of operation is not at all secure, since it 
stores almost all the surfing activity of the user on the hard 
drive. However, with the availability of various browser 
forensics tools in the market, the forensic investigation of 
web browsers has been made quite easy for the forensic 
investigators. 

V. Google Incognito (Private Web Browser) 
Forensics 

Like Chrome, other well-known web browsers also 
allow user to surf internet in Private mode. The private 
browsing feature of web browsers gives the user freedom 
of browsing the internet, without keeping any record of his 
browsing history, cookies, temporary Internet files, 
usernames/passwords, form data etc. The feature also 
helps prevent him from third party websites that usually 
track user’s browsers activity. Thus, these types of 
browsing features tend to make the job of forensic expert 
or forensic investigator hard. (Donny & Narasimha, 2013) 

For the user to start the private browsing in Chrome, 
press Chrome Menu > New Incognito window. 
Alternatively incognito window can be opened from 
Ctrl+Shift+N shortcut. The user can enjoy private 
browsing unless he himself turns it off. 

A. Google Incognito providing false sense of 
security 

From a logical view point, it does make sense that for 
the browser to function normally, even in the private mode, 
it needs to write some amount of useful data or 
commands/instruction to a portion of disk. Where is it 
written, and whether it gets deleted or not, is a million 
dollar question because Google do not provide any public 
specification document on Google Chrome’s Incognito 
Mode Implementation. (Jessica, n.d). 
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Google Chrome does not store any web activity of the 
user who has enabled private browsing mode in his 
browser, but it certainly doesn’t stop the operating system, 
websites and the router from keeping account of what the 
user do. Thus in reality, these web browsers give a false 
sense of security to the users, and do not completely 
guarantee the secrecy and privacy of the user’s browsing 
activities. 

B. Detecting private mode: 

Since the evidence extraction for private mode of 
operation is slightly different from the regular mode the 
forensic examiner needs to know whether the suspect uses 
private mode or not. A simple indication for private mode 
on an opened browser is the presence of Incognito sign and 
the absence of Recent tabs bar. But if the browser if closed, 
the forensic examiner can run Chrome Cross -mode 
Interference inspector application on the suspect’s PC to 
see when the private mode was enabled and where was the 
data stored. 

The subsection that follows gives brief forensic analysis 
of the Google Chrome’s Incognito mode. Before this 
experiment, Google Chrome was fully uninstalled and then 
reinstalled so as not to confuse between the artifacts 
obtained from regular and private modes of Chrome. The 
chrome was opened in private incognito mode and some 
browsing, downloading and surfing was done to check 
whether private browsing mode keeps artifacts or not. The 
results of the research are: 

1. Web Activity 

Private browsing mode though enabled keeps track of 
user internet activity. The History file, present in the 
default Chrome Folder, when opened in SQLite browser 
after closing the browser, as proved by the snapshot in 
Figure 14 below, shows that Chrome does store visit time 
but not the actual URL of the page visited. However using 
BelkaSoft RAM capturer, the analysis of the captured 
RAM indicates the presence of browsed websites in the 
RAM. (Noorulla, 2014) 

DB Browser for SQLite - C:/Users/Narmeen Pc/AppData/Lo< 

File Edit View Help 

0 New Database ^ Open Database Write Changes p Revert Changes 

Database Structure Browse Data Edit Pragmas Execute SQL 



Table: [Hj visits ▼ [g] [n 



id 


url 


visittime 


from_visit 


transition 


segmentjd 


visit_di 


| Filter 


| Filter] 


| Filter | 


| Filter | 


| Filter | 


| Filter 


| Filter 


1 |l 


1 


130660731943... 


0 


805306368 


0 


0 


2 2 


2 


130659111083... 


0 


805306368 


0 


0 


3 3 


3 


130659071785... 


0 


805306368 


0 


0 



Figure 14. SQLite DB (History file after browser was closed) 



2. Cache 

Like regular browsing mode, private browsing mode 
stores cache files in the Temporary Internet Files folder of 
Google Chrome. Snapshot shown in Figure 15 below 
shows the cache files viewed in Chrome Cache Viewer, 
after the browser was closed. 



ChromeCacheView: C:\Users\Nc 



File Edit View 


Options Help 










URL 


Content Type 


FileS... 


Last Accessed 


Server Ti... Server Last ... 


https://oauth.go... 


text/javascript 


22,632 


1/18/2015 10:22:36 PM 


1/18/201... 


https://ssl.gstatic... 


. text/javascript 


3,408 


1/18/2015 10:22:36 PM 


1/14/201... 1/8/2015 5:2... 


https://apis.goog... 


application/javasc... 


4,933 


1/18/2015 10:22:36 PM 


1/18/201... 


https://www.goo... 


image/png 


2,372 


1/18/2015 10:22:35 PM 


1/18/201... 11/20/2013... 


https://www.goo... 


text/javascript 


1,934 


1/18/2015 10:22:35 PM 


1/18/201... 4/2/2014 5:3... 


https://www.goo... 


image/png 


9,737 


1/18/2015 10:22:35 PM 


1/18/201... 12/1/2012 1:... 



Figure 15. Cache file opened in Chrome Cache Viewer 



3. Cookies 

In private browsing mode, the cookies are associated 
with a session time, and hence expire once the browser 
closes. They can thus only be copied, if the browser is left 
open. 

4. Bookmarks 

They can be easily seen, by just clicking the Bookmarks 
file present in Default folder, even after session expires. 
Thus, bookmarks need to be manually deleted by user. 

5. Third Party Websites 

Private browsing may temporarily hide the data from 
someone, trying to search for browsing activity in browser 
history, but the third party websites are still able to trace 
the IP, track user’s activity and send malwares via 
links/pop-ups. 

6. Downloads 

The download list is cleared after the browser closes, but 
the downloads can still be seen in the downloads window 
and needs to be manually deleted. 

7. Other Observations 

WebData and Shortcut files when viewed in SQLite 
Viewer gave no data. Neither Chrome gives suggestions 
when typing URL/ form data nor it saves any user 
credentials. 

Though data is deleted at the expiry of the session in 
private web browsing but this data certainly does not get 
wiped off the drive. The forensic examiner should 
therefore know all places and folders where the internet 
activity of the user and browser preferences gets stored. He 
may use any forensic tool e.g. FTK Toolkit etc. to view the 
deleted data. (DFIRninja, 2014) 

VI. Google Chrome’s Portable Web Browser 
Forensics 

Operating in Portable mode means that the user installs 
the portable version of web browser i.e. “Google Chrome 
Portable” on a portable medium (e.g. a removable hard 
disk) or cloud service and uses it on any PC. Since the 
browser has been installed on the portable medium, the 
artifacts get stored in the same installation folder too. 

The portable feature allows the user to keep his data 
including downloads, bookmarks, saved videos, music, 
browser extensions etc. with him, all the time, in his 
portable drive. This not only loads the webpages faster but 
also provides greater privacy to the user by storing the 
browser related data in portable drive. 

Bad guys can use it in corporate computers where no 
browser is installed/ allowed. Portable browser however, 
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is a great challenge for forensic investigators, because if 
the removable media is unplugged by the suspect/criminal, 
the artifacts are out of the reach of the forensic examiner. 
(Divyesh & Nagoor, 2014). Essentially all the browsing 
history, cookies, cache files, downloads, auto fill data was 
required to be stored in the removable disk from which the 
browser was running, but it was interesting to note that 
Chrome’s portable mode of operation also leaves artifacts 
like browsing history, images, downloads, credentials, etc. 
on the host system in the NTFS Allocated and Unallocated 
space, Pagefile.sys, Memdump, System3 2/Wine vt/logs 
etc. (Donny & Narasimha, 2013). Hence the objective of 
this research was to find any artifacts of user’s activity in 
Google Chrome Portable that might have got stored in his 
PC. 

Another problem that forensic investigator faces while 
handling cases of portable web browsing is that there is no 
way to determine whether any web browser was used in 
portable mode or not. In such criminal cases, where the 
forensic analysis of web browser in portable media is 
required, the forensic analyzer must first check the list of 
portable/ removable media attached to the computer, by 
analyzing the Windows Pre-fetch files or the Windows’ 
registry via regedit command. He should then check 
locations where Portable Chrome might save artifacts, and 
then conclude his observations. 

Experiment involves downloading the Google Chrome 
Portable browser in Kingston 16 GB USB and installing it. 
Google Chrome application present in USB was then run 
on HP Pavilion laptop. For traces to be available in 
Chrome folder and other memory locations, surfing was 
done for quite some time and then changes in the default 
Chrome Folder were noted. As part of research, other 
important drive locations were also searched for internet 
evidences. 

The image of both hard drive and USB were taken via 
FTK Imager and then the search keywords were analyzed 
via FTK Search function. It was revealed that part of the 
browsing data was stored in 

GoogleChromePortable/data/profile/Default folder and 
free space in USB. However browsing history, cookies, 
cached websites, saved passwords etc., also got stored in 
the ../FocalSettings/Temp/GoogleChromePortable folder 
in the C drive, and remained there even after the USB, 
containing Portable Chrome application, was detached. 
Although the artifacts retrieved from the drive in this 
experiment were less than the artifacts found in normal 
Chrome browsing session, they are sufficient for the 
forensic investigator to reconstruct the browser session. 
Thus, Google Chrome Portable version also provides a 
false sense of security to the users. 

A. Forensics of Chrome Portable Incognito mode 

The private web browsing mode also exists within the 
portable Chrome browser. This Portable Incognito mode is 
even safer than the normal Incognito or portable mode of 
operation. Same experiment was conducted, but with 
Incognito mode enabled in Google Chrome Portable web 
browser. The results obtained from the FTK Five Search 
indicate that web browsing data get stored in the free disk 
space of USB. The incognito mode, despite its claim to 
rarely leave any trace on computer’s disk, leaves abundant 



internet artifacts in the virtual swap file. (Marrington, I, 
2012). The wealth of artifacts, however, varies according 
to the amount of PC’s RAM, number of active processes, 
and size of swap file. Since this virtual swap file is quite 
frequently overwritten, the investigator can only extract 
data for the most recent portable browsing sessions only. 

VII. Summary of Preliminary Changes 

The working of Google Chrome in all three modes of 
operation is quite different. Table 3 below, summarizes the 
forensic analysis of Google Chrome in normal, private and 
portable mode of operation. 



TABLE 3 . SUMMARY OF CHROME MODE OF OPERATION 



Mode 


Forensic Analysis 


Normal 

Mode 


• Browsing history, cached websites, cookies, 
downloads, saved passwords etc. are stored in 
..\Chrome\User Data Directory in C drive 


Private 

Mode 


• Cookies, Bookmarks, History etc. gets stored 
in the Default Chrome folder 

• Browsed websites can be seen in RAM 

• New timestamp replaces 

chrome_shutdown_ms.txt on session expiry 

• User credentials and videos not stored 


Portable 

mode 


Forensic artifacts were found in 

• Drive: Data is stored 
in../LocalSettings/Temp/ 
GoogleChromePortable folder in C drive 

• USB: Data gets stored in 
GoogleChromePortable/data/profile/Default 
folder and also in free space in USB 


Portable 

Incognito 

mode 


Forensic artifacts were found in 

• USB: Data gets stored in free space in USB 

• Drive: Data gets stored in pagefile.sys 



IX. Future Trends 

Web Browser forensics has become an important field 
of research for the forensic researchers. Today, most of the 
Web browser Forensic tools target any specific web 
browsers, and those few that are able to analyze multiple 
web browsers, lacks the accurate artifacts extraction. In 
order to address this issue, a methodology should be 
designed to analyze multiple browsers simultaneously 
with one tool, and integrate their data according to the 
timestamps for integrated artifact analysis. Based on this 
designed methodology, a forensic tool should be 
developed for the forensic experts, to speed up their 
process of investigation. Moreover, since the web 
browsers are updated frequently, forensic analysts must be 
able to forensically analyze the newer versions too. 

Tike regular browsing mode stores a lot of data 
pertaining to the user’s web activity on the drive, the 
private and portable web browsing modes that though 
claim to provide privacy to users are not really secure. 
Before accepting claims of privacy of portable and private 
modes of other web browsers i.e. Mozilla Firefox, Internet 
Explorer, Opera etc., the forensic examiners need to 
forensically analyze them too and find a way to trace the 
internet artifacts efficiently. Browser forensics should 
similarly be conducted on other Operating systems too. 

Artifacts of web mail, instant messaging and social 
media applications like Facebook, Gmail, Yahoo, Twitter, 
MySpace etc., as discussed in the paper, gets stored mostly 
in the hyberfil.sys and pagefill.sys. Thus these system files 
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must also be forensically analyzed in depth for more 
details. 

However, since the trend of computer is gradually 
shifting towards the smartphone, the forensic investigator 
must also thoroughly carry out browser forensic of smart 
phones. 

X. Conclusion 

No user can browse safely on the internet. Whether the 
user has enabled the privacy mode or is working on a 
portable browser application, the browser tends to store a 
large amount of data regarding the user’s surfing activity, 
his username passwords, downloads, temporary files, 
cache, form data and other browser specific data on the 
user’s hard disk, and that is from where the forensic 
examiner can collect the artifacts from, to reconstruct the 
timeline of user’s web activity. 

Browser Forensic Tools are the best source for the 
forensic experts to find the artifacts from web browser, in 
case of any suspected illegal Internet activity. The forensic 
experts can therefore utilize the efficiency of these forensic 
tools to find internet artifacts from various different 
locations in the computer’s memory. Though the stored 
web data can be traced down to the exact folder, the 
deletion of any evidence by the culprit can seriously affect 
the progress of the case. 
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Abstract — Natural language processing emerges as one of the 
hottest topic in field of Speech and language technology. Also 
Machine learning can comprehend how to perform important 
NLP tasks. This is often achievable and cost-effective where 
manual programming is not. This paper strives to Study NLP 
and ML and gives insights into the essential characteristics of 
both. It summarizes common NLP tasks in this comprehensive 
field, then provides a brief description of common machine 
learning approaches that are being used for different NLP tasks. 
Also this paper presents a review on various approaches to NLP 
and some related topics to NLP and ML. 

Keywords -Natural Language Processing , Machine learning , 
NLP, Ml 

I. Introduction 

A. Natural Language Processing 

Natural Language Processing is a hypothetically driven 
range of calculative techniques for analyzing and representing 
naturally texts at one or more levels of linguistic analysis in 
order to achieve human-like language processing for a range of 
tasks or applications [1]. 

The term Natural Language Processing surrounds a wide 
set of techniques for automated generation, manipulation and 
analysis of natural or human languages. NLP techniques are 
affected by Linguistics and Artificial Intelligence, Machine 
Learning, Computational Statistics and Cognitive Science. 
Here is introduction of some basic terminology in NLP that 
will be avail [2] : 

• Token: Linguistic units of an input text, such as words, 
punctuation, numbers or alphanumeric s. 

• Sentence: An ordered sequence of tokens. 

• Tokenization: The process of separating a sentence into 
its component tokens. 

• Corpus: A body of text, ordinarily containing a large 
number of sentences. 

• Part-of- speech (POS) Tag: A symbol stands for a 
lexical category, like NN(Noun), VB(Verb), 
JJ(Adjective), AT(Article). 

• Parse Tree: Describes the syntactic structure of the 
sentence as defined by a formal grammar. 



Now let's consider some common NLP tasks: 

• POS Tagging: Assigning POS tags to each word in the 
sentence. 

• Computational Morphology: Discovery and analysis of 
the internal structure of words using computers. 

• Parsing: Constructing the parse tree given a sentence. 

• Machine Translation (MT): Translating the given text 
in one natural language to fluent text in another 
language with a computer, without any human in the 
loop. 

For more information, an overview of NLP tasks is 
provided below [3]: 

• Low-level NLP tasks: Sentence boundary detection, 
Tokenization, POS tagging, Morphological 
decomposition of compound words, Shallow parsing 
(chunking), Problem- specific segmentation. 

• High-level NLP tasks: Spelling/grammatical error 
identification and Recovery, Named entity recognition 
(NER), Word sense disambiguation (WSD), Negation 
and uncertainty identification, Relationship extraction, 
Temporal inferences/relationship extraction, 
Information extraction (IE). 

A brief description about NLP, is shown in tables I and II. 
Table I illustrates some of NLP characteristics [1]. Table II 
presents levels of NLP [4]. 



TABLE I. Some Of Nlp Characteristics 



Origins 


Computer Science; Linguistic; Cognitive Psychology. 


Divisions 


Language Processing; Language Generation. 


Approaches to 
NLP 


Symbolic Approach; Statistical Approach; Connectionist 
Approach. 


NLP 

Applications 


Information Retrieval (IR); Information Extraction (IE); 
Question- Answering; Summarization; Machine 

Translation (MT); Dialogue Systems. 
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TABLE II. Levels of NLP 



Levels Of NLP 


Definition 


Phonetics And 
Phonology 


Knowledge About Linguistic Sounds. 


Morphology 


Knowledge Of The Meaningful Components Of 
Words. 


Syntactic 


Knowledge Of The Structural Relationships 
Between Words. 


Semantic 


Knowledge Of Meaning. 


Pragmatics 


Knowledge Of The Relationship Of Meaning To The 
Intentions Of The Speaker. 


Discourse 


Knowledge About Linguistic Units Larger Than A 
Single Utterance. 



B. Progression of NLP Research 

There are three different areas for progression of NLP 
research which will finally lead NLP research to grow into 
natural language understanding. The areas are the following 
[5]: 

• Syntax-centered NLP: Manage tasks such as 

information retrieval and extraction, auto- 
categorization, topic modeling, etc. Categories of this 
area are keyword spotting, lexical affinity, statistical 
methods. 

• Semantics-based NLP: Focuses on the inherent 

meaning associated with natural language text. 
Categories of this area are techniques that leverage on 
external knowledge or semantic knowledge bases, 
methods that exploit only intrinsic semantics of 
documents. 

• Pragmatics-based NLP: Attempt to understand 

narratives by leveraging on discourse structure, 
argument-support hierarchies, plan graphs, and 
common-sense reasoning. 

C. Machine Learning 

With advances in computer technology, we presently have 
the ability to store and process enormous amounts of data, and 
likewise to access it from physically far locations over a 
computer network. Most data acquisition devices are digital 
now and record dependable data. There is a process that 
explains the data that is observed. 

Though the details of the process underlying the generation 
of data are unknown, it may not be feasible to identify the 
process entirely, but it can construct a good and helpful 
approximation. Though identifying the complete process may 
not be possible, it can still be suitable to detect specific patterns 
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or regularities. This is the role of machine learning. Such 
patterns can help to comprehend the process, or use those 
patterns to make predictions. 

Application of machine learning methods to large databases 
is called data mining. In data mining, a large volume of data is 
processed to construct a simple model with valuable use. But 
machine learning is not just a database problem; it is also a part 
of artificial intelligence. To be intelligent, a system that is in a 
changing environment should have the ability to learn. If the 
system can learn and adapt to these changes, the system 
designer needs no predict and provides solutions for all proper 
situations. Machine learning also helps to find solutions for 
many problems in vision, speech recognition, and robotics. 

Machine learning is programming computers to optimize a 
performance criterion using example data or past experience. 
There is a model defined up to some parameters, and learning 
is the execution of a computer program to optimize the 
parameters of the model using the training data or past 
experience. The model may be predictive to make predictions 
in the future, or descriptive to obtain knowledge from data, or 
both. 

Machine learning uses the theory of statistics in building 
mathematical models, because the core task is making 
inference from a sample. The role of computer science is 
divided into two parts: First, in training, it is required the 
effective algorithms to solve the optimization problem, and 
also to store and process the enormous amount of data. Second, 
once a model is learned, its representation and algorithmic 
solution for inference needs to be efficient too. In particular 
applications, the effectiveness of the learning or inference 
algorithm, namely, its space and time complexity, perhaps are 
as significant as its predictive accuracy [6]. 

II. Machine Learning in Natural Language 
Processing 

Natural Language Processing (NLP) deals with real text 
element processing. The text element is transformed into 
machine format by NLP. 

A system capable of obtaining and combining the 
knowledge automatically is referred as machine learning [7]. 
Machine learning systems automatically learn programs from 
data [8] . Here are shown some machine learning methods [9] : 

• Classifiers: Document classification, Disambiguation. 

• Structured models: Tagging, Parsing, Extraction. 

• Unsupervised learning: Generalization, Structure 

induction. 

The application of machine learning to natural language 
processing is constantly increasing. Spam filtering is one where 
spam generators on one side and filters on the other side keep 
finding more and more talented ways to surpass each other. 
Perhaps the most striking would be machine translation. After 
decades of research on hand-coded translation rules, it has 
become apparent lately that the most favorable way is to 
provide a very large number of example pairs of translated 
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texts and have a program understand automatically the rules to 
map one string of characters to another[6]. 

A. Machine Learning Models 

Machine-learning models can be widely classified as either 
generative or discriminative [3]. These models are briefed here: 

• Generative methods: Create rich models of probability 
distributions; like Naive Bayes classifiers and hidden 
Markov models (HMMs). 

• Discriminative methods: Directly estimating 

succeeding probabilities based on observations; like 
Logistic regression and conditional random fields 
(CRFs), Support vector machines (SVMs). 

B. Some Statistical Machine Learning Approaches 

There are a wide variety of learning forms in the machine 
learning literature. However, the learning approaches other 
than the HMMs have not been used so widely for the POS 
tagging problem. However, all well-known learning forms have 
been applied to POS tagging in some degree. The list of these 
approaches is here. The interested reader can refer to them in 
the companion wiki for further details. 

• Support vector machines (S VM) 

• Neural networks (NN) 

• Decision trees (DT) 

• Finite state machines 

• Genetic algorithms 

• Fuzzy set theory 

• Machine translation (MT)ideas 

• Others: Logical programming, dynamic Bayesian 

networks and cyclic dependency networks, memory-based 
learning, relaxation labeling, robust risk minimization, 
conditional random fields, Markov random fields, and latent 
semantic mapping [10]. 

III. Classical Approaches to Natural Language 
Processing 

Text Preprocessing: text preprocessing is the task of 
converting a raw text file, basically a sequence of digital bits, 
into a well-defined sequence of linguistically definable units. 
Text preprocessing is a main part of any NLP system, since the 
characters, words, and sentences identified at this stage are the 
fundamental units passed to all more processing stages, from 
analysis and tagging components, such as morphological 
analyzers and part-of-speech taggers, through applications, 
such as information retrieval and machine translation systems. 

Text preprocessing can be divided into two stages: 
document triage and text segmentation. Document triage is the 
process of transforming a set of digital files into well-defined 
text documents. Text segmentation is the process of converting 
a well-defined text corpus into its component words and 
sentences. 

Lexical Analysis: Words are the building blocks of natural 
language texts. The techniques and mechanisms for performing 
text analysis at the level of the word, is lexical analysis. A basic 
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task of lexical analysis is to relate morphological variants to 

their lemma that lies in a lemma dictionary bundled up with its 
invariant semantic and syntactic information. The mapping of 
string to lemma, as the only one side of lexical analysis, is the 
parsing side. The other side is mapping from the lemma to a 
string, morphological generation. 

Syntactic Parsing: This approach presents basic techniques 
for grammar-driven natural language parsing, that is, analyzing 
a string of words (typically a sentence) to determine its 
structural description as a formal grammar. In most conditions, 
this is not an aim in itself but rather an intermediary step for the 
goal of additional processing, such as the assignment of a 
meaning to the sentence. Finally, the requested output of 
grammar-driven parsing is typically a hierarchical, syntactic 
structure appropriate for semantic interpretation. The string of 
words comprising the input will usually have been processed in 
separate phases of tokenization and lexical analysis, which is 
therefore not part of parsing proper. 

Semantic Analysis: In general linguistics, semantic analysis 
refers to analyzing the meanings of words, fixed expressions, 
whole sentences, and utterances in context. In practice, this 
means translating original expressions into some kind of 
semantic meta-language. There is a traditional division made 
between lexical semantics, which involves itself with the 
meanings of words and fixed word combinations, and supra- 
lexical (combinational, or compositional) semantics, which 
involves itself with the meanings of the unclearly large number 
of word combinations-phrases and sentences-allowable under 
the grammar. 

Natural Language Generation: Natural language 

generation (NLG) is the process by which thought is rendered 
into language. It has been studied by philosophers, 
neurologists, psycholinguists, child psychologists, and 
linguists. The people who look at it from a computational 
perspective study the generation in the fields of artificial 
intelligence and computational linguistics. From this viewpoint, 
the generator -the equivalent of a person with something to say- 
is a computer program. 

Its work begins with the initial intention to communicate, 
and then on to determining the content of what will be said, 
selecting the wording and rhetorical organization and fitting it 
to a grammar, by way of formatting the words of a written text 
or establishing the prosody of speech. The process of 
generation is usually divided into three parts, often executed as 
three distinct programs: 

1 ) Identifying the goals of the expression. 

2 ) Planning how the goals may be achieved by evaluating 
the situation and available communicative resources. 

3) Realizing the plans as a text [10]. 

IV. Some Empirical and Statistical Approaches to 
Natural Language Processing 

Corpus Creation: A corpus can be defined as a collection 
of machine-readable reliable texts (including official copies of 
spoken data) that is sampled to be representative of a certain 
natural language or language variety. Corpora provide a 
material basis and a test bed for building NLP systems. On the 
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other hand, NLP research has contributed considerably to 
corpus development, especially in corpus annotation, for 
example, part-of-speech tagging, syntactic parsing. Some 
issues in corpus creation are such as corpus size, 
representativeness, balance and sampling, data capture and 
copyright, markup and annotation, multilingual and multimodal 
corpora. 

Treebank Annotation: The main use of tree -bank is in the 
area of NLP, most often and significantly statistical parsing. 
Tree-banks are used as the training data for supervised machine 
learning (which might contain the automatic extraction of 
grammars) and as test data for evaluating them. Corpus 
annotation is not a self-contained task: it serves, among other 
things, as: 

1) A testing and training data resource for NLP. 

2) An inestimable test for linguistic theories on which the 
annotation schemes are based. 

3) A resource of linguistic information for the build-up of 
grammars and lexicons. 

Part-of-Speech Tagging : One of the earliest steps within 
this sequence is part-of-speech (POS) tagging. It is usually a 
sentence based approach and given a sentence formed of a 
sequence of words, POS tagging tries to label (tag) each word 
with its correct part of speech (also named word category, word 
class, or lexical category). 

This process can be considered as a simplified form (or a 
sub process) of morphological analysis. Whereas 
morphological analysis involves finding the internal structure 
of a word (root form, affixes, etc.), POS tagging only deals 
with assigning a POS tag to the given surface form word. 

Nevertheless, Part-of-Speech problems are ambiguous 
words and unknown words. POS tagging approaches are rule- 
based approaches, Markov model approaches, and maximum 
entropy approaches. 

Statistical Parsing: Statistical parsing means techniques for 
syntactic analysis that are based on statistical inference from 
samples of natural language. Statistical inference perhaps 
invoked for different aspects of the parsing process but is 
mainly used as a technique for disambiguation, that is, for 
selecting the most proper analysis of a sentence from a larger 
set of possible analyses. 

The task of a statistical parser is to map sentences in natural 
language to their preferred syntactic representations, either by 
providing a ranked list of candidate analyses or by selecting a 
single optimal analysis. 

Multiword Expressions: Languages are made up of words, 
which combine via morpho-syntax to encode meaning in the 
form of phrases and sentences. While it may appear relatively 
harmless, the question of what constitutes a "word" is an 
unexpectedly vexed one. To be able to regain the semantics of 
these expressions, they must have lexical status of some form 
in the mental lexicon, which encodes their specific semantics. 
Expressions such as these that have surprising properties not 
predicted by their unit words are referred to as multiword 
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expressions (MWEs). Types of MWEs are Nominal MWEs, 

Verbal MWEs, and Prepositional MWEs [10]. 

V. Components of Learning Algorithms 

A classifier is a system that inputs a vector of discrete 
and/or continuous feature values and outputs a single discrete 
value, the class. The three components of learning algorithms 
are [8]: 

• Representation: A classifier should be represented in 
some formal language that the computer can handle. 

• Evaluation: An evaluation function (also called 
objective function or scoring function) is needed to 
differentiate good classifiers from bad ones. 

• Optimization: A method is needed to search among the 
classifiers in the language for the highest-scoring one. 

A. Machine Learning Algorithms for Sentiment 

Classification 

Sentiment classification or Polarity classification is the 
binary classification task of labeling an opinionated document 
as declaring either a totally positive or a totally negative 
opinion and a technique for analyzing subjective information in 
a large number of texts. A standard approach for sentiment 
classification is to use machine learning algorithms. 

One of the machine learning algorithms is taxonomy based 
depending on result of the algorithm or type of input available. 
Description of these algorithms is summarized here [7] : 

Supervised learning: Creates a function which maps inputs 
to expected outputs also named as labels, like Nave Bayes 
classification, and support vector machines. Supervised 
learning learns classification function from hand-labeled 
instances. 

Unsupervised learning: Unsupervised learning is learning 
without a teacher. This learning Models a set of inputs, like 
clustering, labels are not familiar during training. As an 
explanation, one basic thing that you might want to do with 
data is to visualize it. Sadly, it is difficult to visualize things in 
more than two (or three) dimensions, and most data is in 
hundreds of dimensions (or more). Dimensionality reduction is 
the problem of taking high dimensional data and embedding it 
in a lower dimension space. Another thing you might want to 
do is automatically derive a partitioning of the data into clusters 
[ 11 ]. 

Semi-supervised learning: One idea is to try to use the 
unlabeled data to learn a better decision boundary. In a 
discriminative setting, you can accomplish this by trying to find 
decision boundaries that don’t pass too closely to unlabeled 
data. In a generative setting, you can simply treat some of the 
labels as observed and some as hidden. This is semi-supervised 
learning [11]. Semi- supervised learning Generate an 
appropriate function or classifier in which both labeled and 
unlabeled examples are combined. 
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B. Domain Adaptation Algorithms 

The concept of domain adaptation is almost related to 
transfer learning. Transfer learning is a public term that refers 
to a class of machine learning problems that include different 
tasks or domains. 

There are three general types of domain adaptation 
algorithms [12]. It could be understood easily when represented 
in a tabular form as given in table III: 



TABLE III. Comparison of Three Classes of Algorithms 





Classl 

(FST) 


Class2 

(PBA) 


Class3 

(ISW) 


Name 


Feature Space 
Transformation 


Prior Based 
Adaptation 


Instance Selection/ 
Weighting 


Level 


Feature 


Model 


Instance 


Dependency 


Moderate 


Strong 


Weak 


Domain Gap 


Large 


Large 


Small 


Extendability 
of Domains 


Strong 


Strong 


Weak 



VI. Word Sense Disambiguation in Brief 

Word Sense Disambiguation (WSD) is mainly a 
classification problem. Suppose a word in a sentence and an 
inventory of possible semantic tags for that word; which tag is 
suitable for each individual instance of that word in context. In 
many implementations, these labels are major sense numbers 
from an online dictionary, but they may also match to topic or 
subject codes, nodes in a semantic hierarchy, a set of feasible 
foreign language translations, or even assignment to an 
automatically induced sense partition. The nature of this given 
sense inventory significantly determines the nature and 
complexity of the sense disambiguation task [10]. 

Also applications of word sense disambiguation are 
applications in machine translation and applications in 
information retrieval. 

An amount of robust disambiguation systems with more 
simple requirements have been developed over the years. These 
systems are designed to operate in a standalone mode and make 
minimal hypothesizes about what information will be attainable 
from other processes. In machine learning approaches, systems 
are trained to do the task of word sense disambiguation. In 
these approaches, what is learned is a classifier that can be used 
to assign invisible examples, until now, to one of a fixed 
number of senses [4] . 

VII. NLP (Natural Language Processing) for NLP 
(Natural Language Programming) 

Natural Language Processing and Programming Languages 
are both based areas in the field of Computer Science. A 
computer program is usually composed of sequences of action 
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statements that point out the operations to be performed. 

Although they are both focused on a common idea -languages-, 
through the years, there has been only little communication 
between them. Here, the example below, tries to show how to 
convert natural language text into computer programs [13]. 

Starting with the natural language text, "Write a program to 
generate numbers between 0 to 10 and write these numbers out 
to screen", the system is analyzing the text with the goal of 
breaking it down into steps as the one shown below: 



"Write a program to generate 
numbers between 0 to 10 and 
write these numbers out to 
screen." 




1. Step Finder: identifies actions (verbs &objects). 

2. Loop Finder: identifies repetitive statements. 

3. Comment Identification: identifies descriptive 
statements. 




#include<stdio.h> 
main ( ){ 

/*show numbers between 0 to 10*/ 
int num; 

for ( num=0 ; num<=10 ; + + num) 
printf ("%d\n", num) ; } 




Figure 1. The natural language (English) and programming language (C) 
expressions for the same problem 



VIII. BioNLP: Biomedical NLP 

BioNLP, also known as biomedical language processing or 
biomedical text mining, is the application of natural language 
processing techniques to biomedical data. The biomedical 
domain presents a number of unique data types and tasks, but 
concurrently has many aspects that are of interest to the 
"mainstream" natural language processing community. 
Moreover, there are moral issues in BioNLP that force an 
attention to software quality assurance beyond the normal 
attention that is paid to it in the mainstream academic NLP 
community [10]. 

There are two basic user communities for BioNLP, and 
many different user types within those two communities. The 
two main communities are the clinical or medical community 
on the one hand, and the biological community on the other. 
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IX. CONCLUSION 

Natural Language Processing is widely considered to be the 
area of research and application, as compared to other 
information technology approaches. There have been adequate 
successes that propose NLP-based technologies will continue 
to be a great area of research and development in information 
systems now and in the future. Also Machine Learning is a 
significant application in NLP that can never be ignored. ML is 
truly a very important and elaborate, however a necessary task 
in the NLP development applications. 

The article is initiated with a brief review of Natural 
Language Processing and Machine learning. Machine learning 
models and approaches are defined in section II. There are 
numerous possible benefits from utilizing every model. Some 
classical and statistical approaches to NLP are introduced in 
section III and IV. 

Domain adaptation approaches in NLP tasks are presented 
in section V that are dispersed to various classes. WORD sense 
disambiguation is declared briefly in section VI. In section VII, 
it is believed NLP for programmers will become main 
technology in their information life. It will be a great event in 
the ML history. NLP for NLP, appears to be a promising model 
especially focusing on standardizing APIs, working on large 
databases like calling functions just by speaking of humans to 
machine, and improving IDEs for complex services. Hence 
there is a scope for further research in these areas. At last, 
BioNLP is showed in short. 
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Abstract — Web services Compositions are rapidly 
gaining acceptance as a fundamental technology in the 
web fields. They are becoming the cutting edge of 
communication between the different applications all 
over the web. With the need for the ubiquitous 
computing and the pervasive use of mobile devices, the 
context aware web service composition becomes a hot 
topic. This later aims to adapt the web service 
composition behavior according to the user’s context 
such as his specific work environment, language, type 
of Internet connection, devices and preferences. Many 
solutions have been proposed in this area. 
Nevertheless, the adaptation was carried out only at 
the runtime and it partially covered the user’s general 
context. In this paper, we introduce a new context- 
aware approach that provides dynamic adaptation of 
service compositions. Our approach allows to express 
requirements by taking into account potential user’s 
context in addition to the functional one. 

Keywords -component; UDDI, AAWS-WSDL , Dynamic 
context , SAB PEL, CAC-WSR 

I. Introduction 

A characterizing feature of SOA-based systems 
is their high dynamicity in selecting the functions 
that satisfy user requirements. Service composition 
plays a fundamental role in this kind of software 
systems. So far, many researchers have investigated 
different techniques to support the automatic 



generation of service compositions from a set of 
published services (domain), given a specific goal to 
reach (problem). 

SOA suffers from a number of limitations and 
weaknesses in the context of composition on 
demand; hence web services adaptation to context 
remains essential to better exploit services. 

Web Services Composition based on ‘context’ 
can give a personalized behavior to the client by 
utilizing information about the client. The 
information can be anything, such a profile of the 
person, which includes name, country, language, his 
current location or it can also be the device on which 
he intends to invoke the service. For e.g., if the client 
wants to access a web page on his mobile device, 
then the page needs to be modified or customized so 
that it can fit into the requirements of the mobile 
device, the memory to be used. 

In addition, most of the existing Web services 
compositions are not originally developed to be 
adaptable. Essentially, tools and mechanisms are 
required to describe and publish adaptable atomic 
Web service and composite web service. In fact, 
current web service standards WSDL, UDDI and 
BPEL are limited to functional view and do not 
support context information. 

Eventhough previous solutions take into account 
non-functional properties of Web service, they are 



Google scholar 

https://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



107 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 13, No. 9, September 201X5 



limited to atomic web service. Moreover, most of 
researches focus only on the publishing of adaptable 
atomic service description level; they did not deal 
with the publishing issue of the adaptable service 
composition. 

In the following, we propose a possible 
extension of WSDL and called it AAWS-WSDL 
(Adaptive Atomic Web Service Web Service 
Description Language). 

On the other hand, we propose an UDDI 
extension that supports context aware service 
composition based on the AAWS-WSDL and 
SABPEL [3]. In addition, we extend the BPEL 
description with the context properties related to the 
users. We propose a new registry called context 
aware composition web service registry (CAC- 
WSR) that publishes adaptable composition web 
services and allows users to select composite 
services according to their profiles. 

The remainder of our article is organized as 
follows. Section 2 describes adaptable service 
composition framework (ASCF), Section 3 describes 
modelling layer. Section 4 describes the description 
layer. Section 5 describes publication layer. Section 

6 describes the CAC-WSR experimentation. Section 

7 discusses some related works, and finally section 8 
concludes the paper. 

II. ASCF: ADAPTABLE SERVICE 
COMPOSITION FRAMEWORK 

In order to reach the full potential of Web 
services, they can be combined to achieve specific 
functionalities. If the implementation of Web service 
business logic involves the invocation of other Web 
services, it is called a composite service. The process 
of assembling a composite service is called service 
composition. Web services run in a context (e.g. 
their operating computing infrastructure). In an ideal 
scenario, Web service operations would do their job 
smoothly. But, several exceptional situations may 
arise in the complex, heterogeneous, and changing 
contexts where they run. 



For example, one of the partner services may go 
down, it may be updated to require new policies, etc. 
When such situations occur, the composite web 
service will not operate appropriately and runtime 
faults will be thrown. Most available web service 
orchestration engines do not provide automated 
support for detecting and reacting to such situations 
and handling them can be done through manual 
human intervention, i.e., the fault must be detected, 
the web service based process that threw the fault 
needs to be undeployed, certain changes should be 
applied either to the process itself or to its 
deployment configuration (e.g., replacing the faulty 
partner service, changing the setting of the 
interactions to be secure, etc), and then restarting the 
composite service. Such an approach is 
inappropriate because the operation of the composite 
service is discontinued, because certain processes 
may be interrupted in the middle of a business 
transaction, and also because of the huge 
administrative overhead. 
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Fig.l : Adaptable web Service Composition main Layers 

The need of managing adaptive services 
composition impels the incorporation of facilities to 
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deal with user’s context in all the services 
composition life cycle. We define the user’s context 
as a set of context information composed of device 
characteristics, connection, location and a set of 
preferences in terms of content and presentation. 

To achieve this objective, we propose in this 
section a framework that supports the development, 
the description and the publication of adaptive 
service composition (ASC). Our goal is to offer 
relevant and suitable information depending on the 
user’s particular profile. Therefore, developers have 
to integrate software facilities dealing with the 
context characteristics. Then, context annotations 
must be used to extend BPEL descriptions. 
Moreover, UDDI must be extended to include 
contextual information of the service composition in 
addition to the atomic web service information and 
the composite web service information currently 
stored in UDDI registries. Our framework is 
organized in a layered architecture and it consists 
mainly of four layers providing specification, 
modeling, description and publication solutions. 

Figure 1 illustrates an overview of our 
framework’s main layers. Each one describes our 
proposal to a given stage of the Web service life 
cycle. 

Modeling layer: It focuses on the design of 
the application behind the atomic service and 
composite service. In fact, in this layer we propose a 
modeling tool called SCA (Service Component 
Architecture). The SCA allows to provide a 
programming model for designing applications and 
SCA solutions based on a Service Oriented 
Architecture (SOA). It is based on a series of 
services, which are assembled together to 
manipulate solutions that serve a particular business 
need. These composite applications contain both 
business function from existing systems and 
applications and also new services created 
specifically for the application, reused as part of the 
composition. 

Description layer: It concentrates on the 
annotation of the BPEL description document with 



contextual information managed by the composite 
service. The adaptable description named SABPEL 
[3] is automatically generated from the service 
implementation using our SABPEL generator. 

Publication layer: It gathers atomic web 
services features, composite web service and context 
supported on a UDDI extended registry named 
CAC-WSR [18]. 

The following sections will concentrate on a 
detailed description of each layer. 

III. Modeling layer 

The SCA specifications concentrate on the task of 
describing the assembly and the configuration of the 
components that compose an SOA application. This 
assembly is used as input to initialize and instantiate 
the application. But, the SCA specifications do not 
address the runtime management of the application, 
which typically includes reconfiguration and 
monitoring. 

Furthermore, the SCA specification does not address 
either the run-time management of the platform 
itself. Yet, these properties are almost mandatory for 
modem SOA platforms in order to be able to adapt to 
changing operating conditions, to support online 
evolution, and to be deployed in dynamically 
changing environments (e.g., ubiquitous 
environments) [3]. 





Service 

Coutexi 



WSDL \ 



wh service A P 



Eig.l : SCA component A extended with management interfaces 

In this section, we present the design of the self- 
adaptation Web service which addresses these 
problems. 

Our solution relies on the addition of the reflection 
mechanisms. Namely, we use these mechanisms to 
provide self-adaptable service using ReMoSSA 
reference model [3] in dynamic context. In fact, we 
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envision separate components for monitoring 
context. These components are attached to each self- 
adaptation web service. From an external point of 
view, we added, at design phase, a set of additional 
interfaces to web service A. In fact, we transformed 
the Web service A into Self-adaptation Web service 
A, as shown in Figure 2. The general structure of our 
conception is shown for an individual web service A 




Fig. 3: Extended SC A component A with all its attached reflective 
components 

Web service A is extended with component for 
service context and component for each task of 
reflection mechanism. 

So, the original “service” and “reference” interfaces 
of Web service A are promoted to the interface of 
Self-adaptation web service A, so that, from a 
functional point of view, this self-adaptation web 
service A can be employed in the same way as the 
original Web service A. 

In our architecture, we integrate reflection 
mechanisms as self-adaptation mechanism that can 
be used to adapt dynamically the behavior of 
applications. A reflection mechanism provides the 
ability of the service-based applications to observe 
and to modify its computation. 

This layer supports adaptive context-awareness 
capabilities. It is designed to offer runtime adaptation 
to dynamic environments. 

We introduce this mechanism in this layer for 
dynamically adapting the service and composing in 
two levels: 




Fig. 4: Metamodel of the AAWS-WSDL 
A base-level subservice provides the system’s 
domain functionality (i.e., application logic). It 
composed by interceptor components as architecture 
pattern. Interceptors use for changing services 
transparently during runtime, adapt services for fault 
tolerance (e.g., failure of servers), and adapt services 
according to capacity and load of resource (e.g., 
distribute the load between servers). 

The second level is the reflective subservice (meta- 
level); it defines the context the web service (the 
platform context, the infrastructure context and the 
application context). 

IV. Description layer 

In order to allow the provider to publish adaptation 
criteria on the one hand and to allow the consumer to 
choose from a list of published Web services 
compositions the most appropriate description 
according to his needs and profile on the other hand , 
we suggest additional information in the description 
file. We therefore propose an extension of the two 
standard WSDL and BPEL having the ability to 
describe composite and atomic service profile 
information. Our extension, named AAWS-WSDL 
and SABPEL, is detailed in [4]. The SABPEL 
document is the output of our description layer. 
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The SABPEL is obtained through a model 
transformation from the adaptive design described in 
the previous section. 



BPEL Context aware 

Functionalities composition 




SABPEL 



a 



SABPEL 



Fig. 2: Description Layer of the ASCF 

Figure 5 illustrates the transformation process in the 
description layer. 





We define the composite context information that is 
“Enhanced high level context information by 
integration context information related with atomic 
service for decide or supply the context aware service 
composition of application. In order to provide 
composite contextual service, context ontology is 
constructed. 




Fig. 3: Metamodel of the SABPEL 

A. Context-aware Service Composition: 
Intersection of atomic Web service 
context 

For our study, we present composite context 
information concept as shown in Figure. 6. 

First of all, we have to define the composite context 
information’s concept for our research. 



Fig. 5 : Intersection of atomic Web service context 
Each elementary web service has atomic context 
ontology which is stored in OWL file. This atomic 
context ontology is automatically generated from 
AAWS-WSDL description file. Figure 8 shows an 
example of composite context intersection using 
three atomic contexts: 

Context Fowl describes the context of hotel 
service (as shown in figure. 8). 

Context2.owl describes the context of airline 
service (as shown in figure. 8). 
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Context3.owl describes the context of bank 



service (as shown in figure. 8). 

In fact, figure. 8 shows a composite context of 
travel agency service. This later uses three atomic 
contexts (Hotelcontext, AirlineContext and 
BankContext. The hotel service uses three 
languages as user’s preference (French, English and 
Arabic); The Airline service uses two languages as 
user’s preference (French, and English); and the 
bank service uses two languages as user’s 
preference (French and Arabic). Therefore, the 
composite context of travel agency uses one 
language (French) that is an intersection of the three 
atomic contexts. 



Travel Agency 
mf ' Hotel 
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Context, Preferences are the main concepts in this 
ontology 




We can distinguish a Context as a composite or an 
atomic. Differently from an Atomic context, a 
Composite context can be further intersection in one 
or more atomic context that can be helpful for a 
better characterization of the associated context 
aspect. 



V. Publication layer 



Fig. 6 : Example of Composite Context 

B. Context-aware Semantic Service Design 

To support design and composition of context-aware 
services we propose: an OWL ontology (OWL- 
SABPEL), supporting the description of sets of 
contexts in specific domains and an extension of the 
OWL-S ontology for services. 

Context-aware semantic descriptions can be 
exploited during the service composition process to 
automatically generate compositions better -tuned to 
the requestor’s behaviors and preferences and to the 
particular situation of the environment. 

Figure. 10 shows OWL-SABPEL, an extensible 
OWL ontology for describing contexts according to 
AAWS-WSDL, described in Section 2. Profile, 



A service provider publishes its service at a service 
registry using a publishing interface. The public 
interfaces and binding information of the registered 
services are clearly defined in the WSDL standard 
language. A registry organizes the published services 
and provides a query interface that enables a service 
consumer to search for a needed service, and obtain 
its provider’s location information. 

A service consumer then interacts with a service 
provider through the SOAP protocol. The most 
known registry is UDDI which is an XML-based 
standard that was proposed for allowing providers to 
publish their web services, so that they can be located 
afterwards by consumers. A UDDI registry is 
structured into three components: BusinessEntity or 
white pages, BusinessService or yellow pages and 
BindingTemplate or green pages. These components 
allow the search for a suitable web service in the 
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UDDI 1 registry according to the three data types. 
Unfortunately, web service discovery through UDDI- 
based registries is insufficient to carry out consumer 
adaptation requirements. This necessity is increasing 
every day with the emergence of portable devices 
and various users preferences. Hence, the provider 
and the consumer want to have a registry providing a 
fairly precise publishing and search capabilities to 
render the use of user-aware web services more 
efficient. To overcome these needs, we propose a 
web service registry offering the ability to save 
adaptation criteria of atomic web service and 
adaptation criteria of composite web service. The 
registry is called context aware composition web 
service registry (CAC-WSR) and it supports the 
publication of the SABPEL description presented in 
section 4. 




Fig. 8: Publication process in the CAC-WSR Registry 
We extend the UDDI data structure by an atomic 
AdaptationCriteria element which makes reference to 
the item UserProfile in the description file AAWS- 
WSDL. In addition, we extend the UDDI data 
structure by an composite AdaptationCriteria element 
which makes reference to the item UserProfile in the 
description file SABPEL. When a supplier requests 
the publication of his service at the registry, an 
analyzer module retrieves the AAWS-WSDL 
description file and the SABPEL description file. 



Then features and access points description will be 
transferred to the Services features database and 
those corresponding to the adaptation criteria will be 
transmitted to the Service Profile database. 

To implement our registry, we propose an extension 
of the JuddiV3 2 API. 

Therefore, we use the apache Tomcat as a web server 
and the MySQL to implement the UDDI data Base. 
We offer a user-interface allowing the provider to 
specify atomic and composite service information 
and to load the description file of its service. Then, a 
parser analyzes two files (the AAWS-WSDL file ans 
SABPEL file). It allows the extraction of the 
functional description carried out by the elements 
(types, portype, message, binding, service) and stores 
it in the database ServiceProfile. 

Then it extracts the profile description contained in 
the element UserProfile and saves it in the database 
ServiceLeature. The db-atomic-context table stores 
the profile of the atomic web service. This table 
refers to the binding - template table that stores 
atomic web service instances. 

The db-bpel-context table stores the profile of the 
composite web service. This table refers to the 
binding - template table that stores composite web 
service instances. 

Our extended registry is compatible with the basic 
UDDI and both of them could coexist in the same 
environment. Publication process is shown in figure 
11. To publish AAWS-WSDL description of the the 
atomic web service in CAC-WSR. The definition 
document which is associated with the supported 
profile is chosen by the provider via the publishing 
interface illustrated in figure 12. The provider have 
to specify the enterprise information, the service 
information, the web server and the description file. 
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Fig. 9: Publish AAWS-WSDL description of the the atomic web 
service 

The CAC-WSR registry contains also a query 
analyzer module used in the selection 
and discovery phase. This module communicates 
with the database Web service Functionalities and 
ServiceProfile in order to select the suitable list of 
Web services meeting user’s query in terms of 
functionality and adaptation. The query analyzer 
allows the customer to: 




Fig. 10: Publish SAB PEL description of the the composite web 
service 

Extract the needs in terms of functionality 
through keywords, 

Extract the needs in terms of adaptation 
through a comparison between the user’s 
profile and the service profile stored in the 
CAC-WSRegistry. The user’s profile is 
automatically detected by a detection module 
named WEDM (WildCat Extension Detected 
module) which is an extension of the WildCat 
API and it is stored in an XML file. 

Composite service finding process over CAC- 
WSRegistry is depicted by figure 14. 
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Fig. 1 1 : Finding composite service process in the CAC- 
WSRegistry 



VI. Illustration of Use 

In this section, we illustrate the feasibility of our 
proposed framework ASCF through the example of a 
Travel Agency composite web service. The service 
client wants to book a room in a hotel on the basis of 
his profile characteristics. We provide corresponding 
solution to each layer. 

A. Modeling layer 

This layer’s focus is on the design of the Travel 
Agency web service features as well as the Web 
service adaptation ability. Figure 16 shows an extract 
of the SCA diagram in eclipse. 

Figure 15 shows the generation of AAWS-WSDL 
hotel service using SCA diagram. The reflection 
model allowing to represent the supported content 
and presentation preferences of the hotel service. 
This component is necessary to generate the AAWS- 
WSDL file for web service. 
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Fig. 12:Generation of AAWS-WSDL Hotel Service 



In our case study we configure the preferences of the 
human consumer hotel service. Clearly, for a chosen 
element, we can select multiple choices. For instance, 
police size could be ”10” or ”12”. Also, we can add 
new values in the proposed textfields. A temporary 
file is associated to every configured view. Then, 
they are gathered into the AAWS-WSDL description 
final document. An extract of the generated 
description file is illustrated in figure 16. 



<aaws-wsdl : profile xmln3:aaw3-wsdl-"http;//local?iost: 8080/AAWS-WSDL/vsdl ” 
name- "ffumanConsumer " > 

<aaws-wsdl : pref erences> 

< a aws-wsdl : af f ichage> 

<aaws-wsdl : langue>Arabic</a aws-wsdl : langue> 

<aaws -wsdl :pays>Tunisia</aaws -wsdl :pays> 

<aaw3 -wsdl : tailles-police> 

<aaws -wsdl : taille>10</aaws wsdl : taillo 
<aaws -wsdl : taille>12</aaws-wsdl : taille> 

</aaws-wsdl : tailles-police> 

<aaws -wsdl : polices> 

<aaws-wsdl :police>Aharoni</aaws -wsdl :police> 

< aaws -wsdl : po 1 i c e >A1 ge r ian< / a aws-wsdl : pol i ce > 

</ a aws-wsdl :polices> 

</aaws-wsdl : af f ichage> 

<aaws-wsdl : contenu> 

<aaws-wsdl : image >true</ a aws-wsdl : imago 
<aaws -wsdl : video>false</ a aws-wsdl : video 
<aaws-wsdl : texte>true</aaws-wsdl : texte> 

<aaws-w3dl : audio>false</aaws-wsdl : audio 
<aaws -wsdl:cableau>false</aaws-wsdl:tableau> 

</ a aws-wsdl : contenu> 

</ a aws-wsdl : pref erences> 



Fig. 13: Extract from the Hotel description file 



In this example, we have a HotelService 
component corresponding to the BookingHotel main 
feature provided to all kind of users, a 
FlightService component corresponding to the 
BookingFlight main feature provided to all kind of 
users and a BankService component corresponding 
to the payment main feature provided to all kind of 
users. In addition, we add the EventBD component 
to retrieve the list of promotions or to add the travel 
agency service to its list of activities. 




Fig. 14: Travel Agency web service 

We can distinguish the travel agency component 
which invokes the hotel service, bank service and 
the flight service. 

B. Description layer 

The Travel Agency web service modeled above 
using our tool SCA diagramm is then implemented in 
Java. The next step is to generate the 

SABPELdescription file gathering all needed 
information about the available functionalities, 
access links and supported profiles. Figure 18 
illustrates the SABPEL generator interfaces. 




Fig. 15: The SABPEL generator GUI interfaces 

As input, we provide the BPEL and the WSDL of the 
composition of the Travel Agency adaptable service. 
Then the provider could select, through the plug-in 
Interface, the composite context of travel agency (as 
shows in figure 8). This later is an intersection of 
three atomic context (the context of hotel service, 
bank service and flight). This composite context is 
included in the BPEL description file. 
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</opetaxion> 



service c>i f ■ sii ics - ft strict ttovp* $ sec ot 
s acidise oatt, 



<S6£Viet |Line* "E5ji;eiistBP4SeiVice^i 

biasing" "fcntt Tcari ag" r.uu" "7v\:zi5sB?ipQit' r > 

<Kjap lasMiess laeaciao" "ftEEp; / t ; S2e?/oHt/proeiasts/Tcuri sa3Pi'/> 

</perc> 
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Fig. 16: Extract from the Travel Ageny description file 

An extract of the generated description file 
(SABPEL) is illustrated in figure 19. 

C. Publication layer 

Next step is to publish the SABPEL description of 
the composite service in our CAC-WSR registry. 
The provider has to specify the enterprise 
information, the service information, the web server 
and the description file to load. The BPEL 
document enriched with the supported profile is 
saved in the db-bpel-context table (cf figure 20). 




Fig. 17: juddi Publication of the Flight and hotel web service in 
the CAC-WSRegistry 

entilyKif m pi 

“ * X ii:jiMapa(le.»g:III62a-y^2-tffic-M.. IiriiCiMd Dfdi^fflje^krkspadmEOeVofag&FilECo... 

Fig. 18: Publication of the Travel agency web service in the 
CAC-WSRegistry 

Below, we expose experiment results of the 

publication process in our CAC-WSR registry: 

1 . The first experiment determined the performance 
of the standard UDDI by measuring the speed of 
the publishing function dealing with BPEL 
document. 

2. The second experiment determined the 
performance of the CAC-WSR registry by 
measuring the speed of the publishing function 
dealing with SABPEL document. Figure 21 



demonstrates a comparison between the span of 
time taken to publish BPEL description and the 
one taken to publish SABPEL description. BPEL 
documents used in these experiments were taken 
from the webservicex portal. 




Fig. 21 : Execution time required to publish web services on the 
CAC-WSRegistry 



VIE Related Works 

In this section, we take a look at some research 
works interested in the possibilities of integrating the 
self-adaptive in the Web Service Composition. We 
provide an overview of some of these works. 

Portchelvi and al. [25] proposed a Goal- 
Directed Orchestration (GDO) approach which 
employs an orchestration engine to provide flexibility 
in responding to the changes in dynamic services 
environment. In GDO Process, the user request for 
service is given to the engine and the composition 
request generator generates an abstract goal tree and 
concrete goal tree. The Abstract goal tree is mapped 
to abstract task tree and the concrete goal tree is 
mapped to the concrete task tree and thereby a 
composition plan is generated with abstract service 
descriptions. If services are not available to construct 
a composition plan then the goal cannot be achieved 
and it leads to goal failure. This failure is reported 
back to the composition requestor and an alternate 
sub-goal for the failed sub-goal is found and given to 
the composition plan generator. However, the authors 
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need to formalize this goal-directed approach using 
formal methods. 

Martin [24] deals with the problem of the 
flexible composition by automated planning for 
which he proposes a model. The sequence and the 
choice operators are defined and used to characterize 
flexible plans. Two other operators are then derived 
from the sequence and the choice operators : the 
interleaving and the iteration operators. The author 
refers to this framework in order to define the 
flexibility produced by my planner, Lambda- 
GraphPlan (LGP), which is based on the planning 
graph. The originality of Lambda-GraphPlan is to 
produce iterations. Martin [24] shows that Lambda- 
GraphPlan is very efficient on domains that allow the 
construction of iterative structures. 

Vukovic [22] proposed a framework for 
building context aware applications on-demand, as 
dynamically composed sequences of calls to 
services. She presents the design and 
implementation of a system, which employs goal- 
oriented inferencing for assembling composite 
services, dynamically monitors their execution, and 
adapts applications to deal with contextual changes. 
To handle composition failures, the author 
introduces GoalMorph, a system which transforms 
failed composition requests into alternative ones 
that can be solved. 

Ashraf Butt [23] proposed a Trend-based 
Service Discovery Protocol (TRENDY), a new 
compact and adaptive registry-based Service 
Discovery Protocol with context awareness for the 
Internet of Things (IoT), with more emphasis given 
to constrained networks. TRENDY’S service 
selection mechanism collects and intelligently uses 
the context information to select appropriate 
services for user applications based on the available 
context information of users and services. In 
addition, TRENDY introduces an adaptive timer 
algorithm to minimise control overhead for status 
maintenance, which also reduces energy 
consumption. Its context-aware grouping technique 
divides the network at the application layer, by 
creating location-based groups. This grouping of 



nodes localises the control overhead and provides 
the base for service composition, localised 
aggregation and processing of data. 

Grati and al. [8] proposed a QoS monitoring 
framework QMoDeSV for composite Web services 
implemented using the BPEL process and deployed 
in the Cloud environment. The proposed framework 
is composed of three basic modules to: collect low 
and high level information, analyze the collected 
information, and take corrective actions when SLA 
violations are detected. This framework provides for 
a monitoring approach that modifies neither the 
server nor the client implementation. In addition, its 
monitoring approach is based on composition 
patterns to compute elementary QoS metrics for the 
composed Web service. 

El Haddad and al. [19] presents a QoS 
aware selection method for Web services 
composition called TQoS (Transactional QoS-Driven 
Web Services Composition). The service selection is 
realized based on QoS and transactional 

requirements. Quality requirement is described as a 
set of weights over QoS criteria. Transactional 
requirement is expressed by a risk notion that denotes 
if the results could be compensated or not. In [19], 
five QoS criteria (execution price, execution 
duration, reputation, successful execution rate and 
availability) are used and a local optimization 
method for service selection is proposed. The authors 
also present and formally analyze a service selection 
algorithm based on the workflow patterns and the 
transactional properties of the component services. 

The ASWSCC Method [11] (Adaptation of 
Semantic Web Service Composition to Context) is a 
model which ensures, on the one hand, this model 
allows management and taking into account the user 
context that makes composition process adaptable to 
different instances of use context, which may change 
during the same session. On the other hand, web 
services matching during composition process by 
using domain ontology as lexical database, its 
purpose is to identify, classify and relate in different 
ways semantic content and lexical language. 
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Furno and al. [7] presented a design 
approach based on a semantic model for context 
representation. It is an extension of the OWL-S 
ontology aimed at enriching the expressiveness of 
each section of a typical OWL-S semantic service 
description, by means of context conditions and 
adaptation rules. The model proposed by the authors 
allows to use context-awareness expressions in 
semantic service descriptions and their adoption 
during composition. 

Hermosillo and al. [9] described CEVICHE, 
a framework that combines Complex Event 
Processing (CEP) and Aspect Oriented Programming 
(AOP) to support dynamically adaptable business 
processes. The adaptation logic is defined as aspects 
(reconfiguration component), and adaptation 
situations are specified by CEP rules (monitoring 
component). However, the decision- making is not 
specified as component in this framework. It is 
integrated into the defined aspects. 

As we have shown, work on adaptation at service 
composition modeling is prolific. Different scopes 
are defined as well as different mechanisms for 
achieving such adaptations. 

Cheng and al. [4] presented an automatic 
Web service composition method that deals with 
both input/output compatibility and behavioral 
constraint compatibility of frizzy semantic services. 
First, user input and output requirements are modeled 
as a set of facts and a goal statement in the Horn 
clauses, respectively. A service composition problem 
is transformed into a Horn clause logic reasoning 
problem. Next, a Fuzzy Predicate Petri Net (FPPN) is 
applied to model the Horn clause set, and T-in variant 
technique is used to determine the existence of 
composite services fulfilling the user input/output 
requirements. Then, two algorithms are presented to 
obtain the composite service satisfying behavioral 
constraints, as well as to construct an FPPN model 
that shows the calling order of the selected services. 

Madkour and al. [17] proposed a three- 
phases adaptation approach: firstly they select the 
suitable services to the current context and we 
recommend them to the adaptation process, in the 



service adaptation phase they perform adaptation by 
using fuzzy sets represented with linguistic variables 
and membership degrees to define the user’s context 
and the rules for adopting the policies of 
implementing a service. Finally they deal with the 
complex requirements of the user by the composition 
of the obtained adaptable atomics services. 

A variety of techniques have been proposed in 
literature which integrates existing services based on 
several pieces on information. Although a rich 
landscape in adaptation related researches, a 
complete and generic context-aware composition 
approach is still missing. Table 1 summarises the 
comparison of context aware service composition 
works. Nor all of the identified features are present in 
a single work, as they focused on different research 
problems. Most surveyed architectures employ a 
centralized composition model. They address 
dynamism in the composition, primarily from the 
perspective of unavailability of selected Web 
services, and deal with the issues of how to replace 
them with other equally capable Web services to 
perform the desired task. 

Context aware service composition frameworks 
should support the following features: 

1. The dynamic composition. Static service 
composition involves pre-compilation of the 
composite service prior to a user’s request. 
Dynamic composition is essential for exploiting 
the current state of available services and 
making adaptations based on run time 
parameters, such as bandwidth and the cost of 
executing the various services. 

2. Re- composition. As composite services may be 
executed in a dynamic environment, the context 
may change and services may become 
unavailable. Therefore it is necessary to have 
some means of recomposing the service on the 
fly. 

3. User interaction. Whilst service composition is 
an automated process, it is necessary to allow 
users to provide feedback when they so wish or 
moreover be integrated in the composition 
process. For example, aside from providing 
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input parameters, users may need to guide the 5. 
composition, by selecting the services and re- 
defining the goals or guide the failure recovery 
process. 

4. Automatic service discovery Working with a 
limited domain of services or predefined service 
types limits the potential of service composition. 
Moreover, new services, possibly with new 
capabilities, may become available or existing 
ones may change their functionality. Having an 
automated means of service discovery is 
therefore an essential feature. 



TABLE 1: Main research challenges and features of automatic Web service composition 













Research work 








Feature 


GoalMorph 

[22] 


TRENDY[23] 


QMoDeSV 

[8] 


ASWSC’C 

Method 

[11] 


Furno 
and al. 
[7] 


CEVICHE 

[9] 


Madkour 
and al. 
[17] 


Cheng and 
al. [4] 


Goal- 
Directed 
Orchestratio 
n. [25] 


Lainbda- 
GraphPlan [24] 


Composition 


Context 


6LoWPANs 


Bpel 


State 


State 


Bpel 


Fuzzy 


Fuzzy 


Goal-onented 


GraphPlan 


method 


aware 
goal Model 




extension 


Planner 


Planner 


extension 


Sets 


Predicate 
Petri Net 


Action 

Planning 


extension 


Service 

markup 


OWL-S 


XML 


BPEL4WS 


BPEL4WS 


OWL- 

S 


BPEL 


OWL 


SAWSDL 


ND 


Graph 


Composition 

model 


central 


distributed 


central 


central 


central 


central 


central 


central 


central 


central 


Context 

monitoring 


PDDL- 

based 

context goal 


Directory 
Agent (DA) 


N 


pairs 

(attribute, 

value) 


OWL- 

Ctx 


Complex 

Event 

Processing. 


OWL 


N 


N 


Y 


Dynamic 

composition 


Y 


Y 


ND 


N 


Y 


N 


Y 


N 


Y 


N 


Re- 

composition 


Y 


Y 


N 


N 


Y 


PD 


Y 


ND 


Y 


Y 


User 

interaction 


Y 


Y 


ND 


Y 


Y 


N 


Y 


N 


PD 


N 


Automatic 

service 

Discovery 


Y 


Y 


N 


ND 


N 


N 


N 


ND 


N 


N 



Y-Done; N-Not Done; P- Partially Done; PD-Partially Discussed; ND-Not Discussed. 



Context monitoring. For the purpose of 
supporting dynamic adaptations, software should 
be aware of changes in its context. Context- 
aware systems are concerned with the 
acquisition of context, the abstraction and 
understanding of context, and 
applicationbehavior based on the recognized 
context. 
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VIII. Conclusions and Future Work 

The composition of services can meet the needs 
increasingly complex of user, by a combination of web 
services within a single business process. However, despite 
this widespread adoption of Web services, many obstacles 
prevent their reconciliation in the composition, or may occur 
within a BPEL process in a state change, the context for 
example. To solve this problem of the Web service adaptation 
invoked in a BPEL process and to satisfy the client context 
and web service context. In this paper, we proposed an 
adaptable Web service registry called CAC-WSR. Our 
solution based on SABPEL and AAWS-WSDL to publish a 
context aware composition. It supports saving and searching 
atomic and composite Web services according to adaptation 
criteria together with functional needs. We detailed our 
registry data structure and implementation. Our extension 
shows the importance of context-awareness in Web Service 
composition. 

For future work, we will extend our approach to be 
used in the open world, in which the service composition 
should react to continuous and unanticipated changes in 
complex and uncertain contexts. We will continue to enhance 
our approach in the categories of context modeling, service 
discovery, service adaptation, and service composition. We are 
currently working on a prototype system to evaluate our 
approach by conducting more experiments to examine 
performance metrics including efficiency of service 
composition 
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Abstract — The raised significant evolution of the Internet next 
to the development of the high prevalence of computers, smart 
phones and the Internet on a large scale in most of the trends of 
life, but this use leads to network attacks, with a large use of 
e-commerce, they needs websites on the Internet, e-commerce 
represents a good reason for criminals or attackers to be diverted 
to profit law. Recently, the attackers used botnets to achieve 
their goals. 

A comprehensive study of botnet is done in this paper , study 
a life cycle of botnet, the attack on the behavior , topologies and 
technologies of botnet, studied of Zeus robots (An ethical 
penetration operation has been done using Zeus botnet version 
1.2.7.19 and using Zeus version 2.0.8.9) were is done in in detail 
to determine its characteristics, and be able to detect it in the 
computers on the Internet. 

Host Botnet Detection Software (HBD’s) is designed and 
implemented to detect Zeus botnet in user’s computers. The 
HDB’s depends on information obtained from studying Zeus in 
addition to information obtained from analysis (an analysis of 
Zeus hot has been done by using reverse engineering tool 
(Ollydbg reverse engineering tool)) and penetration operation. In 
order to remove Zeus botnet from victim computers. 

Keywords - Zeus, Host Botnet Detection Software (HBDS), 
Botnet, Ollydbg reverse engineering tool, Zeus botnet version 
1.2.7.19 and using Zeus version 2.0.8. 9 

I. Introduction 

Internet users have been attacked continuously by 
widespread E-mail viruses and worms. However, in recent 
years, a major virus or worm outbreak causing great loss has 
not been seen. This is not because the internet is much more 
secure, but more likely because attackers have shifted their 
attention to compromising and controlling victim computers; 
an attack scheme which provides more potential for personal 
profit and attack capability. This lucrative attack theme has 
produced a large number of botnets in the current internet 
[ 21 ]. 

Botnet now represents a great and dangerous digital crime 
for computer network and internet. Botnet challenges fall into 
two categories, First, botnet can be remote access by 
botmaster to run any command come from botmaster in victim 
computer and return results to him. Second, Botnet can be run 
in victim computer (user’s computers) without user consent. 
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So, many users may have bots run in their computers but they 
did not know anything about it. 

According to Trend Micro, the first bot to appear was in 
1999 and was named PrettyPark. PrettyPark bot implements a 
way of controlling malicious program remotely using IRC 
networks [26] [32]. At the start of 2000, many sites such as 
Yahoo, eBay, Amazon, and CNN have been attacked by a 
Canadian hacker. During 2002, new communication protocols 
for bots/botnets were developed. From 2003, bots used 
different techniques to spread. Nowadays, bots can be 
delivered by most of the ways that current malware can. 
Today, bots are used for extortion, spam and phishing, identity 
theft, and malware seeding [14] [22] [32]. 

Zeus botnet is one of the dangerous botnets nowadays. It 
steals banking accounts for large money profits and it can steal 
any account (user name and password) typed in user computer 
infected by Zeus bot such as mails accounts or social websites 
accounts, etc. Zeus is a family of malicious software that 
focuses on stealing passwords for financial institutions, and 
includes several rootkit components to provide stealth 
capabilities. The Zeus malware, which originated in Russia 
has been in existence since 2007, and was continuously being 
updated and widespread in 2009. It is one of the largest 
botnets in existence, affecting approximately 75,000 
computers in over 200 countries. It is possible to purchase a 
Zeus “bot-maker” kit on underground Internet forums, which 
can be used to generate malware that is distributed to victims 
via drive-by downloads or spam email campaigns [27]. 

The primary goal of the Zeus malware is to steal 
passwords and sensitive information for web-based financial 
accounts, which are then used to transfer stolen money to 
criminals [18] [24]. Zeus is known by many names (ZBOT 
due to its botnet capabilities, WSNPoem, PRG), and others — 
but its use has been particularly criminal [27]. 

Zeus is a malware package that is readily available for 
sale and also traded in underground forums. The package 
contains a builder that can generate a bot executable and Web 
server files (PHP, images, SQL templates) for use as the 
command and control server. While Zbot is a generic back 
door that allows full control by an unauthorized remote user, 
the primary function of Zbot is financial gain — stealing online 
credentials such as FTP, email, online banking, and other 
online passwords [4] [16] [33]. 
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So, the main contribution is to paper, analyze, detect, and 
remove Zeus bot in computer networks and internet. 

II. Related works 

Binkley J. et.al. (2006), present an anomaly-based 
algorithm for detecting internet relay chat (IRC)-based botnet 
meshes. The algorithm combines an IRC mesh detection 
component with TCP scan detection heuristic called the TCP 
work weight. They discussed how to combine TCP-based 
anomaly detection with IRC tokenization and IRC message 
statistics to create a system that can clearly detect client 
botnets. Their system is deployed in their network and works 
well [2] [7] [9] [30]. 

“Peer-to-Peer Botnets: Overview and case” study was a 
research title from Grizzard J. B. et al. in 2007, where they 
present an overview of peer-to-peer botnets. Their case study 
of the Trojan. Peacomm bot demonstrates one implementation 
of peer-to-peer functionality used by a botnet [11] [29]. 

Gu G. et. al. presented in 2007 a prototype system, 
BotSniffer, that can capture spatial-temporal correlation in 
network traffic and utilize statistical algorithms to detect 
botnets with theoretical bounds on the false positive and false 
negative rates. They evaluated BotSniffer using many real- 
world network traces. The results show that BotSniffer can 
detect real-world botnets with high accuracy and has a very 
low false positive rate [5] [15] [31]. 

In 2008, Gu G. et. al. presented a general detection 
framework that is independent of botnet C&C protocol and 
structure, and required no a priori knowledge of botnets (such 
as captured bot binaries and hence the botnet signatures, and 
C&C server names/addresses). In their experimental 
evaluation on many real-world network traces, BotMiner 
showed excellent detection accuracy on various types of 
botnets (including IRC-based, HTTP based, and P2P-based 
botnets) with a very low false positive rate on normal traffic 
[4] [8] [17]. 

In 2008 Erbacher R. et. al. designed a multi-layered 
architecture for the detection of a wide range of existing and 
new botnets. The key advantage of the architecture is that it 
allows for the integration of wide ranging techniques. By 
allowing algorithms from other researchers to be integrated 
through the open architecture, they allow for the greatest 
possible detection strategy [19]. 

“Salvador P. et. al. presented in 2009 a new approach, 
based on neural networks, that is able to detect Zombie PCs 
based on the historical traffic profiles presented by ’’licit” and 
’’illicit” network applications. The evaluation of the proposed 
methodology relies on traffic traces obtained in a controlled 
environment and composed by licit traffic measured from 
normal activity of network applications and malicious traffic 
synthetically generated using the SubSeven backdoor. The 
results obtained the proposed methodology was able to 
achieve good identification results [18] [20]. 

Binsalleeh H. et. al. presented an analysis of the Zeus 
botnet Crime ware Toolkit in 2010. Their analysis aims to 
uncover the various obfuscation levels and shed light on the 
resulting code. In addition, they detail a method to extract the 



encryption key from the malware binary and use that to 
decrypt the network communications and the botnet 
configuration information. They present a detailed reverse 
engineering analysis of the Zeus crimeware toolkit to unveil 
its underlying architecture and enable its mitigation and 
designed a tool to automate the recovery of the encryption key 
and the extraction of the configuration information from the 
binary bot executable [6] [27] . 

Al-Hammadi Y. presented in 2010 a framework as a host- 
based botnets /bots detection system. The detection of botnets 
or bots on the infected machine is performed by correlating 
hots' behavioral attributes. His framework is applied to various 
bots command and control structures such as IRC bots and 
P2P bots. An evaluation of the framework showed success in 
detecting malicious behaviors on the system [32]. 

Zang X. et. al. in 2011 presented a research in order to 
counter the escalation of the botnets evolution. The mining 
based detection methods operated on the flow level internet 
traffic have demonstrated some promising performances. A 
preliminary experiment has been conducted in this paper 
observing the discriminating capabilities of the Hierarchical 
and K mean clustering algorithms and exploring a RTT 
adjustment procedure to mix the botnet trace with the 
background internet traffic [30] . 

In 2011, Wang S. et. al. worked on Tracking Botnets 
using NetFlow and PageRank, where NetFlow related data is 
correlated and a host dependency model is leveraged for 
advanced data mining purposes. They extended the popular 
linkage analysis algorithm PageRank with an additional 
clustering process in order to efficiently detect stealthy botnets 
using peer-to-peer communication infrastructures .The key 
conceptual component in this approach is to analyze 
communication behavioral patterns and to infer potential 
botnet activities [6] [10]. 

In 2012, Nair H. et. al. presented a paper to discuss some 
of the botnet detection techniques and compare their 
advantages, disadvantages and features used in each 
technique. The botnet detection techniques can be classified 
into three types, namely honeypot, passive anomaly analysis 
and based on traffic application [13][ 23 ]. 

In 2012 Alomari E. et. al. presented a comprehensive 
study to show the danger of Botnet-based DDoS attacks on 
application layer, especially on the Web server and the 
increased incidents of such attacks that have evidently 
increased recently. Incidents around the world and revenue 
losses of famous companies and government Web sites are 
also described; indicating that extreme care should be taken 
and a further study should be conducted to assess the size of 
the problem and then derive an optimal solution [3] [18]. 

III. What is zeus 

Zeus is a financial malware. It infects consumer PCs, 
waits for them to log onto a list of targeted banks and financial 
institutions, and then steals their credentials and sends them to 
a remote server in real time. Additionally, it may inject HTME 
into the pages rendered by the browser, so that its own content 
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is displayed together (or instead of) the genuine pages from 
the bank’s web server. Thus, it is able to ask the user to 
divulge more personal information, such as payment card 
number and PIN, one time passwords and TANs, etc. [28]. 

Zeus has two perspectives: 

• From a technical perspective, it is a crime ware tool 
primarily used to steal money. 

• From another perspective, it signals a new wave in 
online criminal business enterprise wherein many 
different organizations cooperate with one another to 
perpetrate outright online theft and fraud. 



2. Additional Tasks : The Zeus has multiple built-in 
commands that can be executed as additional tasks. 
The execution of these commands can be created as 
tasks in the command and control server. When the 
bot connects to the server, the server will see if any 
active tasks should be sent to the bot for execution 
[16]. 

V. Zeus functions's on operating system of victim 

COMPUTER 



Fig. 1 shows how a typical Zeus infection takes place 
[27]. When a victim visits a targeted site, the bot steals the 
credentials that are entered by the victim. Afterward, it posts 
the encrypted information to a drop location that is meant to 
store the bot update reports. This server decrypts the stolen 
information and stores it into a database [6] [27]. The 
communication protocol used is HTTP, specifically HTTP 
POST requests to send stolen data and retrieve commands 
from the C&C server. All data packets are encrypted using 
RC4 encryption to avoid detection [1] [15] [31]. 



Typical phishing infection flow 




Spammed messapes Clicking 

suppoMdly from legitimate embedded link in 

sites arrive m users' inboxes the message leads 

users to a phishing 
sits 

Other phishing infection flow found 





Users unknowingly 
download a malicious 
file from malicious sites 




ZBOT variants are known for logging users' keystrokes to 
steal personally identifiable information (PII). particularly 
financial-related data, which is then sent to cytoercnminals 




Figure 1. Zeus Infection Diagram [27] 

IV. Zeus functions 

Zeus bot has many functions to do. Some functions are 
related to the victim computer information that Zeus deals 
with, and others are related to the operating system. 

Zeus functions are functions on victim computer 
information have main tasks and additional tasks; all of them 
are explained below : 

1. Main tasks : The main purpose of Zeus is to steal 
online credentials as specified by the hacker. Zeus 
performs four main actions (Gathering system 
information, Stealing protected storage information, 
Stealing online credential information as specified 
by a configuration file, Contacting the command and 
control server for additional tasks to perform [16]. 



Because Zbot is a package that is readily available, 
vectors of infection vary widely, with popular methods 
including drive-by download and SPAM. SPAM runs of Zbot 
are a regular occurrence using social engineering tactics, 
impersonating organizations such as the FDIC, IRS, 
MySpace, Facebook, and Microsoft. Once the bot is 
executed, the following actions take place [16] [24]. 

• It adds a process to the path “system32\sdra64.exe”. 

• It sets the previous path to: 
“HKEY_LOCAL_MACHINE\Software\Microsoft\Windo 
wsNT\winlogon\userinit”, so that winlogon.exe spawns 
the process at startup time. 

• It looks for winlogon.exe, increases its privileges, injects 
its code and a string table into this process, and creates a 
thread to execute this code. 

• The main bot executable terminates. 

• The injected code in winlogon injects additional code into 
svchost.exe. 

• It also creates a folder named System \lowsec and puts 
two files inside: local.ds and user.ds. Local. ds is the latest 
dynamic configuration file downloaded from the server. 
User.ds contains stolen credentials and other information 
to be transmitted to the server. 

• The code inside svehost is responsible for network 
communication and third-party process injection required 
to hook Internet-related APIs in order to inject or steal 
information to/from banking sites 

• The communication between these various injected 
components is done with mutexes and pipes, maliciously 
named _AVIRA_x, where x is a number (eg: x=2109 in 
winlogon.exe, x=2108 in svchost.exe) [16]. 

If Zeus is run by using an account that does not have 
Administrator privileges, code will not be injected into 
winlogon.exe, but instead into explorer.exe. Also, instead of 
copying itself to the System folder, the bot will copy itself to 
UserProfile \Application Data\sdra64.exe, and create the 
folder UserProfile \Application Data\lowsec. Finally, the bot 
will create a load point under the registry key: 

HKE Y_CURRENT_USER\Software\Microsoft\W indows\C 
urrentVersion\Run\”userinit”=” UserProfile \Application 
Data\sdra64.exe” [16]. 

Once Zeus is installed, the malware waits until a user logs 
into a financial website that is specified in the configuration 
file. It then injects predetermined code into the browser to 
include additional textboxes for the user to enter sensitive 
information . The configuration file can be customized based 
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on the user’s location and language. An example of this the binary and the settings that will be written into the 

injection is shown in Fig. 2. The malware logs the sensitive configuration file and downloaded at runtime, 

information and transmits it to the botmaster via encrypted 
network traffic [24]. 




Figure 2 Zeus/Zbot Login Form Injection [24] 

Now an explanation of Zbot’s general behavior after it 
has infected a system. Most of Zbot’s data stealing logic is 
driven by the hooks it places inside processes. Here is a quick 
run-down of typical Zbot API hook as seen in table I [12]. 

The ntdll.dll hooks are intended to ensure that the Zbot 
memory resident component is injected into new processes 
and the new process’s API’s are hooked. Most of the rest are 
intended to monitor and steal data that those API’s are used to 
send [12] 

VI. Components of zeus 

The Zeus crimeware toolkit is a set of programs which 
have been designed to setup a botnet over a high- scaled 
networked infrastructure. Generally, the Zeus botnet aims to 
make machines behave as spying agents with the intent of 
getting financial benefits. The Zeus malware has the ability to 
log inputs that are entered by the user as well as to capture and 
alter data that are displayed into web-pages [25]. These 
components are: 

1. Control panel (server): The Zeus Server is also 
remarkably simple to configure. A cybercriminal simply 
drops the Web server files onto his/her machine, looks for 
the install page, and fills in some very basic settings. 
Once set up, this server will receive all of the data Zeus 
bots steal. It also has many other features such as keeping 
tabs on how many infected users there are (based on OS, 
geographical location, and others) and running scripts on 
infected machines, just to name a few as shown in Fig. 3 
[27]. 

2. Configuration file: All Zeus botnets are built based on a 
highly versatile configuration file. This file contains 
settings such as the botnet’s name, the times it will send 
stolen information back, and the server the malware 
should connect to. More importantly, however, it contains 
a list of banks for Zeus to target. Zeus has the ability not 
only to gather all the banking login credentials and 
passwords users enter but also to directly inject extra form 
components into users’ banking website view as 
mentioned earlier [27]. It is divided up into sections that 
start with “entry” with the two main being “entry 
‘StaticConfig’” and “entry ‘DynamicConfig’”. These two 
sections deal with the settings that will be hardcoded into 



TABLE I. TYPICAL ZBOT API HOOK 



Dll Name 


Api Name 1 Dll Name 


Api Name 


ntdll.dll 


NtCreateThread 
(pre Vista) 


user32.dll 


BeginPaint 


ntdll.dll 


NtCreateUserProcess 
(Vista and later) 


user32.dll 


EndPaint 


ntdll.dll 


LdrLoadDll 


user32.dll 


GetDCEx 


kernel32.dll 


GetFileAttributesExW 


user32.dll 


GetDC 


wininet.dll 


HttpSendRequest 


user32.dll 


GetWindowDC 


wininet.dll 


Http S endRequestEx 


user32.dll 


ReleaseDC 


wininet.dll 


IntemetCloseHandle 


user32.dll 


GetUpdateRect 


wininet.dll 


InternetReadFile 


user32.dll 


GetUpdateRgn 


wininet.dll 


InternetReadFileEx 


user32.dll 


GetMessagePos 


wininet.dll 


InternetQueryDataAvail 

able 


user32.dll 


GetCursorPos 


wininet.dll 


HttpQuerylnfo 


user32.dll 


SetCursorPos 


ws2_32.dll 


Closesocket 


user32.dll 


SetCapture 


ws2_32.dll 


Send 


user32.dll 


ReleaseCapture 


ws2_32.dll 


WSASend 


user32.dll 


GetCapture 


user32.dll 


OpenlnputDesktop 


user32.dll 


GetMessage 


Dll Name 


Api Name 


Dll Name 


Api Name 


user32.dll 


SwitchDesktop 


user32.dll 


PeekMessage 


user32.dll 


DefWindowProc 


user32.dll 


TranslateMessage 


user32.dll 


DefDlgProc 


user32.dll 


GetClipboardData 


user32.dll 


DefFrameProc 


crypt32.dll 


PFXImportCertStor 

e 


user32.dll 


DefMDIChildProc 


nspr4.dll 


PR_OpenTCPSocke 

t 


user32.dll 


CallWindowProc 


nspr4.dll 


PR_Close 


user32.dll 


RegisterClass 


nspr4.dll 


PR_Read 


user32.dll 


RegisterClassEx 


nspr4.dll 


PR_Write 




Figure 3. Zeus Server Installation Page 

The static options include timing options (how long to 
wait between attempting to download the config file etc...), the 
URL from which the configuration file is downloaded, and a 
URL that is used to check the external IP address that the hot 
is phoning home from. These will be written into the binary 
when it is distributed [12]. 
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The dynamic options mainly centre on what particular 
web addresses the hot owner wants to target but there are also 
several housekeeping entries, including: 

• A URL from which a new Zbot executable will be 
downloaded. 

• A URL to which stolen data is sent back. 

• A URL at which a further configuration file can be 
downloaded. 

The other dynamic options include: 

• A set of URL masks that enable or disable logging 
for those URLS. 

• A set of URL pairs where one URL is redirected to 
the other URL. 

• A group of URL’s from which TAN’s (Transaction 
Authentication Number) will be harvested. 

• A set of IP/domain pairs that will be written into the 
hosts file to hijack DNS requests. 

• A set of URL masks each with a corresponding block 
of HTML that will be injected into any page whose 
request matches the URL mask (Web Injects) [20]. 

3. Encrypted configuration file: Zeus Builder then takes this 
configuration file and encrypts it. All Zeus bots regularly 
dial home and download the encrypted configuration file to 
see if they have already received new orders, which, of 
course, make security researchers’ jobs more difficult. 
Receiving a copy of an encrypted configuration file does not 
tell researchers anything unless we can also extract the 
encryption key from the corresponding Zeus binary[18]. 

Criminals who use Zeus now take both the encrypted 
configuration file and the Zeus binary, which they created 
with Zeus Builder, and place them on a Web server. Zeus 
allows either each component to be placed on a separate 
Web server or all of its components on a single Web server. 

4. Binary file: when user’s system is infected by the binary, it 
will apply the latest version of its configuration settings and 
begin stealing the user’s personally identifiable information 
(PII). After only a few mouse clicks, cybercriminals can get 
access to a fully functional banking Trojan or botnet [18]. 
The executable can be built with the “Build loader” button. 
The Builder will embed the information needed to retrieve 
and decrypt the configuration file into the Zbot binary 
before packing it with its own custom run-time packer. 
Generally speaking, Zeus customers will then pack the 
executable again with some other packer [20]. 

5. Zeus builder: Zeus Builder is one of the key parts of the 
Zeus toolkit. It is responsible for creating the binary file 
used to make the botnet as well as the configuration file that 
stores all of the botnet’s settings. When a criminal first runs 
Zeus Builder, they are presented with a simple screen that 
shows information about the Zeus version they purchased. 
Interestingly, however, Zeus also checks if the local system 
is currently already infected by the Zeus malware, which 



gives the user an opportunity to remove it practical 
implementation. [27] 



VII. Practical implementation 

Zeus is very difficult to detect even with up-to-date 
antivirus software. This is the primary reason why its malware 
family is considered the largest botnet on the internet. Security 
experts are advising that businesses continue to offer training 
to users to prevent them from clicking hostile or suspicious 
links in emails or on the web while also keeping up with 
antivirus updates. Symantec claims its Symantec Browser 
Protection can prevent "some infection attempts" but it 
remains unclear if modern antivirus software is effective to 
prevent all of its variants from taking root [33]. Removal of 
the Zeus rootkit was confirmed by rebooting and performing 
subsequent scans of corroborating tools, as well as observing 
the lack of certain behaviors, such as the hiding of the 
System32/lowsec directory [24]. 

Fig. 4 represents a flow chart that explains the general 
scheme of the work, starting with botmaster who creates Zeus 
hot using Zeus builder, then this hot used in penetration, 
analysis, detection, removal operations. The penetration 
operation uses Zeus hot and depends on the study. So it results 
in Zeus database creation at the botmaster’ s command and 
control. The analysis process analyzes Zeus hot and creates a 
report with many information about Zeus hot such as CPU, 
registers, and memory information etc... The detection process 
depends on penetration and analysis information besides to our 
paper to give a report that is explains if the user computer 
infected or not. Finally The removal process uses all above 
process information to have the ability to remove Zeus hot 
from user computer and give a report presented to user that 
Zeus has been removed successfully. 

A. Penetration process 

At this stage, penetration processes are done to a victim 
computer by server computer because Zeus botnet work as 
client -server model using http protocol to communicate 
between botmaster and any hot at victim computer. 

To do this, a suitable configuration and installation must 
be done for C&C server which the botmaster uses to control 
botnet network that it manages at server side. 

Fig. 5 explains the main steps for penetration operation 
using windows XP operating system (O.S) and using windows 
seven operating system (O.S) besides to two different versions 
of Zeus hot. Zeus hot version 1.2.7.19 and Zeus hot version 
2 . 0 . 8 . 9 . 

To penetration windows XP operating system Botmaster 
needs the following steps to configure and prepare its 
computer to have the ability to run its command and control: 

1. Penetration windows XP operating system 

Botmaster needs the following steps to configure and 
prepare its computer to have the ability to run its command 
and control: 
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Start 




End 



Figure 4. General Scheme Of Practical Implementation And Analysis 



a) Installing a program that makes any computer as 
(web server) needed. There are many programs to do 
this, such as (Appserv) or (Xampp). In this paper, 
Xampp is used with version 1.7.3 to enable us to do 
this work. Appendix A-l explains the main window 
for Xampp program. 

b) At this step, the server computer of botmaster must 
Listen to a specific IP that represents the server 
computer IP and port number which should be listen 
to, and which is (80) here and represents http protocol 
port that Zeus hot used, after Zeus hot infects the 
victim computer, it sends any collected information 
to the botmaster through command and control run at 
botmaster server’s computer. So, server computer 
must listen to have the ability to get and save the 
information received from victim computer hot. 
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To do this, an edit process will be done to some 
information in the file named (httpd.conf) by adding 
botmaster server computer IP with port number (80). 
This file exists at this path: 

“C: \ xampp \ appserv \ conf \”. Appendix A-2 shows 
an example of this configuration file. 



Start 




End 



Figure 5. Penetration Operation Scheme 

c) In order to manipulate Zeus command and control 
and run it by botmaster at a server computer, a copy 
of a command and control files must be transferred to 
this path: 

“c:\xampp\htdocs\xampp”. Also a suitable folder 
name for these (.php) files such as (botnetserver) 
must be selected because it will be useful later to 
access Zeus command and control by internet 
explorer. Appendix A-3 shows the content of Zeus 
command and control version 1.2.7.19 files. 

d) Running Xampp program by using control panel 
which belongs to Xampp is next step. Two services 
must be activated to work in a right manner as shown 
in Appendix A-4 . These services are first Appserv 
which enables to implement the programming code 
related with command and control written in PHP 
language. Second is to run MySQL service which 
will help to manipulate Zeus botnet Database. Then, 
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operation, victim computer’s operating system is 
windows XP professional service pack 3 as shown in 

Appendix A- 12 . 



an internet browser must be run, and then entered to 
Xampp program through address line. After this, a 
strong password for MySQL must be chosen with the 
super user (root) as a user name because this user 
name has the full permissions to manage database 
such as (add, edit or delete). 

e) The next step is to select “phpmy admin” from 
Xampp program which represents a graphical user 
interface (GUI) to give facilities when working with 
databases instead of writing instructions and line 
commands using MySQL. The great benefit here is to 
create a database for Zeus botnet which will contain 
any information that comes from hot to botmaster as 
shown Appendix A-5. 

f) All above steps prepare a suitable environment to 
install Zeus command and control on botmaster 
server computer and use it. In order to do that, this 
line is written in the address line of the internet 
browser: 

“Your IP\xampp\botnetserver\install\index.php“. As a 
result, Zeus command and control will be displayed 
as shown in Appendix A-6 . So the fields must be 
filled and then the install is chosen. 

After that, Zeus command and control will be 
ready for use as shown in Appendix A-7 . At this 
point, the installation process for Zeus command and 
control at botmaster computer is finished. 

g) In order to build and create Zeus bot, Zeus builder 
toolkit with version 1.2.7.19 will be used. Appendix 
A-8 Shows Zeus builder window. When select 
“builder” to build Zeus bot as shown in Appendix A- 
9, a suitable configuration to Zeus bot configure file 
must be done by “Edit config” option in the builder 
window, then “build config” option will create Zeus 
configuration file (cfg.bin). Appendix A- 10 shows a 
sample of Zeus bot configuration file. This file 
contains the information needed for Zeus bot after 
penetrate victim’s computer to connect with 
botmaster in a right manner, besides a list of websites 
that Zeus dealt with. “Build loader” option creates bot 
execution file (bot.exe) which is used to penetrate 
victim computer by botmaster. 

h) After user’s computer was penetrated and infected, 
botmaster must notice a change on his server 
computer at the option (Total bots) in the command 
and control window, this number represent the 
number of Zeus bot which can the botmaster 
managed at this time. So botmaster can manipulate 
with these bots and managed them and give them any 
command he wants from any of Zeus commands. 
Appendix A-ll shows Zeus control panel with total 
bots =1 which means successful penetration 
operation. 

The operating system of the victim’ s computer can be 
shown at botmaster C&C. when botmaster select 
“OS” option at his C&C; he can know victim 
computer operating system. In this penetration 



2. Penetration windows seven operating system 

Zeus Builder version 2.0. 8.9 can build a bot that can 
penetrate windows seven operating system and steals 
passwords from victim computers that’s used this operating 
system. 

Botmaster needs the following steps to configure and 
prepare his computer in order to penetrate windows seven 
operating system victim computers. 

a) Executing the first five steps explained previously in 
section (Penetration windows XP operating system). 
These steps include a complete configuration for 
botmaster computer to run as a server and communicate 
with the bot successfully. 

b) Opening Zeus 2. 0.8. 9 control panel installer window by 
typing this line in the address line of your browser. 

“ Y our IP/Xamp/botnetserver/install/index.php” . 

3. Penetration experiments and results 

Three main types of experiments were done ethically on 
victim’s computer to get results in botmaster computer, these 
types are financial sites experiments, social sites experiments 
and experiments using virtual keyboard. 

a) Financial site experiments: When user tries to buy his 
needs using internet, he must pay online. Many 
experiments are done, one of them, a user tries to buy an 
antivirus program online, the program’s site presents 
some field that must filled to buy the software such as 
name, address, payment type and card number etc. as 
shown in Fig. 6. After user finish, Zeus bot works and 
send all the information to the botmaster. In order to get 
this information by botmaster he must select “search 
database” option from command and control then open 
the site and get a report from bot. Total number of reports 
always increased by one every time bot sends information 
to botmaster. Fig. 7 shows the report at botmaster C&C 
which explains all information that user entered 
previously at financial site. 

b) Social sites experiments: Many experiments are done on 
Google mail, Facebook, twitter social site. 

c) Experiment using virtual keyboard: Virtual keyboard 
are supplied by windows operating system and some 
website to enter username and passwords in their site 
without using computers keyboard. For example a 
situation when user tries to sign in in the bank of Baghdad 
which provide embedded virtual keyboard as shown in 
Fig. 8 The result is found in a report added to Zeus 
database at botmaster’ s command and control. Fig. 9 
shows this report in database with all information about 
victim computer. 
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Figure 6. Online Payments On A Website 
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Figure 7. Database report of financial site 
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Figure 8. Bank Of Baghdad Sign In Window 
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Figure 9 Bank Of Baghdad Sign In Report In Zeus Database 

B. Zeus bot analysis using reverse engineering tool 

It is very difficult to analyze bots without using reverse 
engineering tools. The analysis operation using reverse 
engineering tool is useful for programmers and professionals 
whose work is with network security research field or work in 
security institutes and security labs. In order to analyze Zeus 
bot, software for reverse engineering tool called “oily debug” 
will be used. Fig. 10 presents the main stages of Zeus bot 

analysis using oily dbg reverse engineering tool. 

In order to analyze Zeus, just select open and select Zeus 
bot which is created previously using Zeus builder version 
1.2.7.19. Fig. 11 explains Ollydbg window after opening Zeus 
bot (bot.exe). As seen, there are many windows that appear in 
this program. Each of them represents some information about 
Zeus bot such as CPU window, memory map window etc... 
Each one of them will be explained in this section Opening the 
bot (bot.exe) results in: 

• New process with Id 00000A90 is created. So this process 
is “sedra64.exe” which the Zeus bot adds to the path 
“C:\windows\system32”. When the host botnet detection 
program for Zeus is designed, this important point must be 
taken to detect Zeus bot. 

• Main thread with Id 00000AB0 is created. This is 
a thread added to winlogin.exe process and created when 
Zeus bot is executed in the victim computer. Then the log 
data window shows a group of (.dll files) that is loaded 
when opening Zeus bot for analysis such as kernal32.dll, 
ntdll.dll and user32.dll. This process creation and thread 
creation operation must be deal with when removing Zeus 
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bot from victim computer or preventing Zeus bot from 
infected users’ computers. 




(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 13, No. 9, September 2015 
Executable modules window which is shown in Fig. 12 
presents executable run. The first line Zeus bot is shown with 
the base address 00400000 in Hexadecimal with size 19000 
byte, with name bot and path which was opened from it 
previously in addition to Zeus bot entry point 0040A9D9. This 
information gives us Zeus base address in memory with the 
total size and entry point address that Zeus uses when it 
implements in victim’s computers. The rest of the information 
in executable modules is about DLL windows used into the 
system with their paths, versions and entry points for each 
windows DLL system file that has been used. 
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Figure 12. Executable Modules by Zeus 

Memory map can be seen in Fig. 13 which explain Zeus 
memory map in four lines: 




Figure 10. Analysis Operation Scheme 




Figure 11. Oily dbg Window After Opening Zeus Bot 



• At the address 00400000, there is PE header which means 
bot header with size 1000 byte. 

• At the address 00401000, there is code of Zeus which is 
(.text) of the size 11000 byte. 

• At the address 00412000, there is 5000 byte size of data 
containing imports function that Zeus uses. 

• Finally, the rest of the data of Zeus bot begins at 
00417000-00419000. 

All these sections have the permission to read and write at 
corresponding memory locations explained above. 




Figure 13. Zeus Memory Map 



C. Zeus detection software 

Host botnet detection software is designed and 
implemented using C# programing language. This detection 
process is done after studying Zeus botnet in details as shown 
previously in chapter three. Ethical penetration process is 
implemented on computers on internet besides the useful 
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information obtained from analyzing Zeus botnet. Fig. 14 
shows a scheme about Zeus bot detection system. 

The main goal of this (HBDS) is to detect Zeus bot in 
computer network and internet. The two important ideas in 
this software are searching for Zeus botnet files in two main 
locations (windows folders and windows registry). 

After the computer has become a victim and is infected by 
botmaster using Zeus bot, the modification to windows XP 
operating system folders and its registry are explained in Fig. 
15 and 16 respectively. Just scan the computer, and the HBDS 
will work and give a message that explains if this computer is 
infected with Zeus bot or not. If the computer is not infected, 
an advice is given to the user to protect his computer and 
information. If the result of the HBDS is that the computer is 
infected, advice is given to the user to firstly use our removal 
operation steps to clean up his computer. 




Figure 14. Zeus Bot Detection Operation Scheme 




Figure 15. Infected Windows XP Operating System By Zeus 
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Figure 16. Infected Windows Registry By Zeus 



D. Zeus bot removal process 

After detecting the Zeus bot in a victim computer that 
uses internet, a suitable reactive process that represents the 
user action after knowing that his computer was infected will 
be needed. Fig. 17 shows the main steps that is must be 
followed to remove Zeus bot from user’s computers. 

First, Zeus bot must be stopped from working then, it 
must be delete from the victim computer. In order to stop and 
remove Zeus bot, an assistance program will be needed to 
perform this action. The program “process explorer” is used in 
our work. This program is like a task manager with windows 
operating system, but the process explorer shows every task 
running at a time with all threads running and belonging to 
specific process. The Process Explorer is described as an 
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advanced process management utility that picks up where 
Task Manager leaves off. It will show you detailed 
information about a process including its icon, command-line, 
full image path, memory statistics, user account, security 
attributes, and more. When you zoom in on a particular 
process, you can list the DLLs it has loaded or the operating 
system resource handles it has opened. A search capability 
enables you to track down a process that has a resource 
opened, such as a file, directory or Registry key, or to view the 
list of processes that have a DLL loaded. Fig. 18 shows 
process explorer program’s window. 




Figure 17. Removal Operation Scheme 




Figure 18 Process Explorer Program Window 



Users have Zeus hot in their computers must follow these 
steps to remove Zeus from their computers: 

1- First, the user must run the “process explorer”, then 
select winlogon.exe from left pane on process 
explorer and open it. 

2- From winlogon.exe properties windows, the user 
must select threads and “Kemal32.dll” thread then 
press kill. So Zeus thread may be stopped by killing it 
as explained above. This process stops the thread that 
makes Zeus work. After killing this thread 

3- From process explorer window, select “find” then 
type “Sdra64.exe”, then press “search”. After that, 
select “sdra64.exe” process, then right click “select” 
close handle, Zeus hot is completely stopped from 
working in the user computer. 

4- The final step is to remove sdra64.exe from the path 
where it exists when it infects victim computer. So, 
go to the path “c:\windows\system32” and select 
sdra64 and press .Then delete sdra64 key from 
registry from the path: 

“HKEY_LOCAL_MACHINE\Software\Microsoft\Win 
do wsNT\winlogon\userinit” . 

Now, Zeus is completely stopped and removed from 
the user computer. Users must now try to secure their 
computers by using protection system to disallow hackers 
from penetrating the computer. 

VIII. Conclusions 

Botnet is considered one of the large criminal nowadays; 
this is because we cannot keep our computers and their 
operating systems secure enough. Also, many users of 
computer networks and internet do not have enough 
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knowledge about malicious codes foundation and existence. 
So they need great awareness about network attacks and their 
dangerous activities. Others who know that there are many 
malicious codes do not use the proper tool to secure their 
operating systems. 

The conclusions we have arrived at are as follows: 
l.In order to begin with an idea to design a Zeus botnet 
detection system, a penetration process has been performed 
and implemented as a botmaster and as a victim. A network 
was configured as client-server network and penetration 
operation was implemented successfully where a botmaster 
can get anything typed in victim computer such as user 
name and passwords of visa card, PayPal websites, yahoo 
mail accounts, Google mail accounts and Facebook 
accounts, etc... . 

2. After the penetration process, all effects that Zeus bot has 
done to the victim user computer are known. Examples are 
Zeus effect on windows operating system in windows XP 
and windows Seven, effects on windows registry, Zeus 
effect on important dynamic link libraries (.DLL), and other 
files which created or affected by Zeus botnet. 

IX. Future works 

Botnet are new methods which attackers use to infect 
computer networks and internet for some different gains and 
reasons. It may be developed rapidly with the revolution of 
internet and its propagation at last years. According, our 
suggestions for future works are: Implementing a multilayer 
botnet detection system depending on the cooperation process 
between our host botnet detection system and the suggested 
network botnet detection system, designing a botnet 
prevention system and implementing it by speed hardware 
devices in order to take action before bots can infect user’s 
computer in internet. Here new technology with (FPGA) 
devices is useful to implement this idea. 
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Figure A-5 Creation Of Zeus Database 
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Figure A-7 Ready Zeus Command and Control 
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Figure A-10 Configuration File Sample 
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ABSTRACT - Security has been a major concern 
for internet users regarding the potential for harm 
that a breach in their systems can present to the 
landscape. The world is evolving towards an 
internet dependent computing architecture 
considering the advancements in cloud computing 
and other such technologies. They promise various 
benefits mostly relating to the cost of operation for 
the businesses and personal Internet users. This 
investigation has the intention of determining the 
role that cryptography plays in ensuring the safety 
of the systems and the information that people 
share over the internet. The paper looks at the 
types of cryptography that are in broad use in the 
field of information technology and how they 
enable the protection that the users need. There 
are important protocols of protection including 
Internet Protocol Security Standard (IPSec) and 
Field Programmable Gate Arrays (FPGAs) that 
use cryptographic techniques. The investigation 
concludes that cryptography is an important 
contributor to Internet security through the 
implementation of such procedures. 



INTRODUCTION 

The Internet and the World Wide Web 
(WWW) have brought different advantages in the 
field of information technology that have 
consequently enhanced significant processes in that 
area. The two technologies have presented the ability 
of people to share information through different 
platforms in communication and other exercises. 
They are an enabler for people to exchange 
information and other procedures with more 
efficiency in terms of accuracy and speed. More 
developments on the internet and the web have 
allowed users to store data and other forms of 
information on the technologies using such avenues 
as cloud computing. 

Using the internet and the web to complete 
certain tasks has advantages that come with it 
including a reduction in operational costs ([6]). They 
lower the requirement of particular hardware and 



Infrastructure for the users by providing a virtual 
environment that facilitates their activities in various 
ways. For example, cloud computing eliminates the 
need for storage devices by providing the virtual 
space for the users to store their data and launch their 
programs. Despite reducing the cost of purchasing 
the hardware, it also eliminates the expenditure on 
maintenance schedules [20]. More benefits associate 
with the use of these technologies that have enabled 
them to replace most traditional requirements in the 
computing arena. 

Security of the information on the internet 
platforms has been a major concern in the recent past 
to the users and almost overshadows the benefits they 
present ([27]). There is a global increase in the 
number of people that wish to use the internet and the 
web to handle different tasks ranging from research 
to education and professional practice [13]. Other 
individuals with malicious intent are taking 
advantage of the increase in traffic and raising havoc 
on the users through various cyber attack techniques 
[1]. There has been an engagement of different 
researchers and developers in information technology 
into the objective of developing security procedures 
that can enable the users to protect their information 
from these predators. The growth of networks to 
share data between users increases the risks for the 
participants as they may have differing objectives and 
some of them malicious. The techniques that the 
developers make today need to evolve each time and 
advance in their implementation to adjust to the 
problems of the future [15]. The evolution is 
necessary because the threats keep changing and 
violating the protocols through their innovations. 

Security has been a primary concern for 
different users of the Internet and the World Wide 
Web due to the dangers that the breaches in their 
information can present. Various businesses have 
termed the safety of their data as the central concern 
and limiter for taking advantage of the benefits that 
the two technologies present. They have to ensure the 
protection of that information because its impact can 
spill over and affect their clients and other persons. 
The users will be liable to more people due to the 
breaches and prefer to abstain from availing their 
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data on the internet platforms. This research will take 
a look at the issue of security on the World Wide 
Web and the situation around the globe. The main 
area of concern is the protocols of transferring and 
storing information and data over the internet that are 
the primary enablers of communication and cloud 
computing. The research will investigate the role of 
cryptography on the web and network security. The 
procedure has different angles of approach that may 
be good or wrong for the safety of the users. 
Therefore, the study will establish the contexts of 
application of cryptographic techniques, and their 
benefits and disadvantages on the internet. 

LITERATURE REVIEW 

What is Cryptography? 

Cryptography is an ancient technique that 
did not begin with the computing age. The 
applications in the area of information technology 
derive from the classical communication between 
two parties who used different techniques to conceal 
their messages [3]. It involves the use of certain 
codes to hide the meaning of the message from other 
parties besides the sender and the recipient [11]. The 
method was in widespread use during periods of wars 
when different warriors and commanders would use 
it to protect their information from their opponents 
and other opposing sympathizers. It is the 
cornerstone of security protocols in modern times 
with applications in computing. Most users apply the 
techniques to protect their information and data from 
people who use it without permission from the 
implementers. The technology is necessary for both 
open and closed networks to safeguard the processes 
of data transmission. A modern definition of 
cryptography is a technique that uses algorithms to 
conceal the context of a message from other parties 
outside the sender and receiver ([8]). Therefore, it is a 
form of secret communication between two parties 
that prevents unintentional access to particular 
information. 

Modern cryptography uses sophisticated 
mathematical and programming algorithms to attain 
its objectives of concealing messages from the wrong 
audiences. There are various components that the 
techniques will apply. So, the algorithms are one of 
them, and the others include digital signatures that 
the users implement to notify the recipients of the 
messages that are pertinent to them [21]. The 
components of cryptographic procedure are necessary 
to ensure the efficiency of the communications and 
their security. Take the example where one party 
sends a secret message, another person with the 
ability to retrieve it can deduce the meaning and 
perform an action they do not intend. Therefore, they 



need techniques that can help them differentiate their 
communication to ensure the measures that follow 
are relevant. Some other components include hash 
key functions and ciphers. In summary, the elements 
begin with the plain text or direct message that the 
communication under cryptography is carrying. The 
plaintext goes through an encryption algorithm that 
turns it into a secret communication called a cipher 
[23]. The algorithm uses a secret key that is a set of 
codes that defines their hidden messages and 
transforms the plaintext into the cipher. The 
understanding of the secret keys is necessary to 
retrieve the messages from the cipher in a process 
called decryption using various techniques and 
algorithms. It is this action that presents a security 
challenge in most cases. Most people breach the 
ciphers by identifying weaknesses that they exploit to 
decrypt the message and abuse its application to the 
recipients, senders, or other participants. 

The various procedures that people use to 
develop secret codes and retrieve their means have 
the name cryptanalysis [4]. Cryptanalysis is the study 
of techniques and principles that develop codes of 
encrypting communications and breaking them to 
retrieve the messages [9]. The letters that go through 
these coding processes are called ciphers, and they 
need the person to practice with the relevant 
cryptographic technique for retrieval of the 
communications they have. There are different 
procedures for retrieving the ciphers including 
decryption that are at the disposal of the particular 
person regarding their training and practice. 
Combining the aspects of cryptography and 
cryptanalysis presents another concept called 
cryptology. Therefore, the whole process of 
practicing cryptographic techniques and breaking the 
codes has an umbrella term that is cryptology. 

Types of Cryptography in Internet Security 

There are two main techniques of 
cryptography; Symmetric and Asymmetric 
encryption [7]. Their primary difference lies in the 
manner of encrypting and decrypting the messages 
that the developers input into the communication. 
The next section discusses the two types of 
cryptography and their applications. 

( 1 ) Symmetric Encryption 

Symmetric encryption is a secure 
communication procedure that uses a single secret 
key to developing the encryption method and 
decrypts the message from the ciphers [14]. 
Therefore, the sender and their recipient will share 
the secret keys and use similar methods of decoding 
the message in their secure communication. The 
primary goal of using this technique between two 
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parties is to ensure that there are high levels of 
privacy for the information they share because there 
are only two ends with access to it. The technique 
presents the benefit of reducing the probability of 
breaches by eliminating the existence of third parties 
that can access the communication they launch 
between them [2]. The two sides in the transmission 
will also share a key-generation algorithm that 
dictates both of them on how to encrypt and decrypt 
their information. This form of cryptography is the 
most conventional form that has seen broad 
applications in different fields including espionage in 
the ancient periods. The nature of the symmetric 
encryption procedure implies that it has three 
algorithms that all the users implement to ensure 
secure connections between them. The first is the 
key-generation algorithm that gives them the method 
of hiding and retrieving the message. It then opens up 
to the encryption algorithm that informs the sender 
on the method they will use to turn their message into 
a cipher. The third is the decryption algorithm that 
tells the receiver the procedure that will enable them 
to get the plain message from the cipher that they get 
from the sender. 

One of the interesting facts is that users have 
to share the same secret keys of encryption and 
decryption if their number increases in the channel of 
communication [17]. They will not develop different 
keys when the number goes beyond one sender and 
receiver respectively. It implies that those on the 
recipient side will share the same decryption key and 
those on the sending one will distribute the 
encryption algorithm as well. The sharing makes the 
system more susceptible because it presents an 
obvious procedure to more parties. An interception 
on the communication is much easier since an access 
to one decryption key leaves all recipients vulnerable 
as the interceptor can now access any information. It 
is among the weaknesses of a symmetric 
communication strategy between parties. The 
challenge implies that if a person has access to any of 
the algorithms, they can transmit messages from any 
of the genuine participants and infect their channels. 
They can input misleading information to them and 
impact their protocols adversely. 

One of the most common techniques using 
the symmetric approach is the Data Encryption 
Standard (DES) that IBM developed for the 
government of the United States [19]. The procedure 
uses a 56-bit key and a 64-bit lock in its 
communication strategy. The level of security for this 
process was susceptible to various interceptions that 
led to an ease of decryption. One of the system’s 
susceptibilities is that it is open to brute force 
techniques of cracking. Here, a person uses all the 
possible combinations of letters and numbers to gain 



access to the system. The authorities engaged 
designers into developing a system that would 
increase the number of combinations that the system 
required to reduce the susceptibility. One of the 
newer techniques that were safer than the DES is the 
Advanced Encryption Standard (AES) that requires 
128, 192 and 256-bit keys in its system [12]. The 
older DES system has more vulnerability for data that 
needs high security; therefore, most developers 
recommend against it. The strength of the secret 
algorithm is still susceptible due to its small 
combination requirements and procedures. The use of 
a similar key for encrypting and decrypting is the 
main challenge because someone who has access to 
either can control the activities of the system. 

(2) Asymmetric Encryption 

Asymmetric Encryption advances from the 
symmetric approach by developing different keys for 
encrypting and decrypting a system message [25]. 
This provision makes it better than the symmetric 
system of communication that is more susceptible to 
breaches. The use differing keys in the creation of the 
cipher and retrieving the messages, the technique 
reduces the probability of someone using either of 
them to outline the objective of any of them. For 
example, an unauthorized person cannot use the 
encryption key to retrieve the message in the 
communication. Therefore, the system stays secure 
from interceptions as a person with the ability to 
develop one of the two keys cannot control the 
system using their strategy. The requirement for 
implementing a communication is that one needs to 
obtain a secret key that helps them to generate a 
cipher and sends it to somebody else. The recipient 
also has another secret key that is pertinent to the 
authorizations of the communication channel that 
will allow them to retrieve the message and use it. 
Any other person cannot use their key to get the 
message if it is not for them as only those that the 
encryption algorithm authorizes can access the plain 
message from the cipher. 

The main advantage that this strategy 
presents is that a sender can develop a key that is 
unique to them and uses it to generate messages and 
control who reads them [24]. The sender can define 
the exact recipients by limiting the access of that 
information to a particular recipient by selecting the 
specific algorithms about the key of the consignee. 
Therefore, the receiver of that communication also 
has a unique decryption key that eliminates any 
external access to the communications that come to 
them. Therefore, the messages in the channel have a 
one-way approach since the keys that the sender has 
cannot decrypt the message from the cipher that they 
develop [5]. They cannot tamper with the original 
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communication in that particular piece of information 
unless they develop another one. Their encryption 
key has no control over any messages that they have 
developed from their systems. This move further 
reduces the susceptibility of the communication 
channel. Besides reducing the probability of 
interceptions, the interceptors have no control over 
the information that is in the message. 

When using an asymmetric approach to 
communication, there are two types of keys that the 
system needs. They are the private and public keys. 
The sender has to visit a directory that contains the 
public keys that are the identifiers of the recipients 
they wish to contact [22]. They encrypt their 
messages and place them in the directory to reach the 
receiver through their public identifier algorithms. 
The recipients of the communication have private 
keys that are unique to them and are not allowed to 
disclose them to any other persons [16]. They use 
those secret keys to access the message that comes to 
them, and the algorithm retrieves the information that 
it contains. No other person can receive the message 
and get the information out of the cipher because it is 
unique to only one receiver. Even the sender has no 
ability to retrieve that information because they do 
not have the private algorithm that the systems accept 
to decrypt it. Any person that intercepts the 
communication must have access to the private key 
to enable them to retrieve the connection. Therefore, 
it is harder to use such techniques as brute force to 
guess the cipher’s decryption strategy as there are no 
hints from the sender’s side that can help. It widens 
the number of guesses one has to make to achieve 
their objective. It is like trying to get three factors of 
a product of numbers when they are more than two 
and one does not have any of them to begin with 
during the process. It is important to note that there is 
a link between the public and private keys in their 
algorithms that enable them to work together. That 
connection in their models allows them to 
differentiate between each other to avoid contact with 
the wrong persons in the communication. 

The Role of Cryptography in Internet and Network 
Security 

Cryptography is an important provision for 
ensuring security over networks that users intend to 
use for sharing information. If the data and 
information are in motion over the networks, then 
they are susceptible to interceptions from people 
outside the legitimate parties in the channel [10]. 
There are different implementations that developers 
and users utilize to secure their transmissions over 
the internet. The large traffic that is available on the 
internet and the vast size of information that is on the 
World Wide Web present various safety challenges to 



both creators and users. The automation of 
computing processes to perform particular tasks can 
further the problem by leaving the control open to 
any person in the event of an interception on the 
networks. Cryptography plays different significant 
roles in preventing unauthorized access to computing 
systems aiming to protect the users from the harm a 
breach may intend to bring. The applications of the 
two techniques are important to secure the 
information that people share over the internet using 
methodologies that prevent access to it until the 
recipient gets it. Other strategies have the objectives 
of lowering the susceptibility of unauthorized access 
to the computing systems and in turn protecting the 
information on them. 

Password-Authenticated Key (PAK) 
Exchanges are one of the methodologies that 
developers use to protect internet users from 
unauthorized access to their systems and databases 
([18]). Cryptography proposes different solutions to 
the problems that pertain to the use of such problems 
by making the information that they share unique to 
the senders and receivers. PAK procedures just 
protect people from accessing the whole system but 
do not promise any security on the information in the 
event that an unauthorized party gets. Cryptography 
presents a further protection from the superficial 
hindrance that passwords present to the users ([18]). 
The passwords are a primitive approach towards 
cryptography but are susceptible to attacks due to the 
different advancement people have made in 
computing. Therefore, they will require an extra step 
to prevent an illegal use of the information that the 
adversary accesses after the attack. 

There are other security protocols that the 
World Wide Web utilizes that apply the principles 
and techniques of cryptography to achieve the 
security goal. Among the strategies are the Internet 
Protocol Security Standard (IPSec) and Field 
Programmable Gate Arrays (FPGAs) that ensure the 
safety of the transmissions that users make over the 
internet. They both use the cryptographic approach to 
ensure that the access to the databases on the Web is 
under the authority of the vendors and report any 
malicious activity [28]. They enable the initiation of 
security procedures to protect the information on the 
internet and inform the users about the threats to take 
further protective measures. They are a progressive 
advancement of the primitive preventive strategies by 
offering the extra alert features that can enable the 
users to protect their information once an outsider 
tries to breach the systems. Therefore, cryptography 
is significant in two levels of implementation for 
internet users [26]. The first area is through the 
protocols of transmission that enable the users to 
access the internet and the databases that are 
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available on the web. The second level is ensuring 
the protection of the databases by controlling the 
access to the users by limiting certain information to 
particular parties alone. 

CONCLUSION 

Cryptography is an important art and science 
for ensuring the security of different protocols and 
infrastructure that users implement on the internet 
and the World Wide Web. The increasing attention 
towards the potential of the internet is motivating 
more people into its utilization. The increase in traffic 
presents a challenge to different vendors especially 
concerning the safety of the information that the 
users share and store on the Web. There is an 
increase in the security interest because the growing 
utility also presents the problem of growing 
cybercrime on the platforms with different intentions 
behind its motivation. The security procedures have 
to evolve on a regular basis to catch up with the 
changing face of the intruders on the systems and 
their innovation in malicious techniques. 
Cryptography presents a solution to this problem by 
allowing the developers to make protocols and 
mechanisms that can enable uniqueness in the access 
of information through the internet. They aim to 
differentiate the users by developing techniques that 
can enable the system administrators to detect any 
malicious activity and its location. The cryptographic 
techniques also present an important layer of 
protection that allows authorization procedures by 
preventing illegal access and use of information. 
With the various advancements in the computing 
industry, more protective strategies are a sure 
promise for the future. As much as it is almost 
impossible to prevent a cyber attack, the future looks 
towards a system where the users have a notification 
of an ongoing breach to enable them to protect their 
information. 
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Abstract — Biometric based identifications are 
widely used for individuals personnel 
identification in recognition system. The 
unimodal recognition systems currently suffer 
from noisy data, spoofing attacks, biometric 
sensor data quality and many more. Robust 
personnel recognition can be achieved 
considering multimodal biometric traits. In this 
paper the LDA(Linear Discriminate analysis) 
based Personnel Authentication using Finger 
vein and Face Images (LDA-PAFF) is 
introduced considering the Finger Vein and Face 
biometric traits. The Magnitude and Phase 
features obtained from Gabor Kernels is 
considered to define the biometric traits of 
personnel. The biometric feature space is 
reduced using Fischer Score and Linear 
Discriminate Analysis. Personnel recognition is 
achieved using the weighted K-nearest neighbor 
classifier. The experimental study presented in 
the paper considers the (Group of Machine 
Learning and Applications, Shandong 
University-Homologous Multimodal Traits) 
SDUMLA-HMT multimodal biometric dataset. 
The performance of the LDA-PAFF is compared 
with the existing recognition systems and the 
performance improvement is proved through the 
results obtained. 

Key words-SDUMLA_HMT ; LDA-PAFF; Phase; 
Magnitude; fisher Score 

I. Introduction 

The use of biometrics to identify personnel is 
widely adopted in our day-to-day scenario. A 
biometric recognition system identifies an each 
personnel using one or more specific physiological 
characteristics possessed by the individuals [1], If 
one physiological characteristics is considered for 



recognition then they are termed as unimodal 
recognition systems. When multiple or a 
combination of personnel biometrics are considered 
then they are termed as multimodal biometric 
recognition systems. Enrollment and verification of 
authorized personnel are the important functions of 
the recognition systems. The recognition systems 
enroll authorized personnel based on the data 
provided from the biometric sensors and store the 
data for future verification or matching. During 
verification the recognition systems validates with 
the existing whether the biometric data presented is 
valid or invalid. Predominantly unimodal systems 
are adopted for personnel identification [2]. 

Key challenges in unimodal biometic systems: 

The unimodal biometric recognition systems 
currently used in day-to-day activities suffers from 
large number of drawbacks [2] [3] [4]. Biometric 
recognition systems solely rely on the data provided 
in the biometric sensors. The data input provided to 
the recognition systems from the sensors are 
generally noisy in nature which can affect the 
verification results and also cause faulty enrollment 
techniques. The illumination variation for face 
recognition systems is one such example. 
Interpersonal biometric similarities is another 
drawback of unimodal biometric systems [4]. 
Unimodal biometric recognition system presented 
in the research work using the finger print [5] it 
clearly illustrates the biometric similarity problem. 
Spoofing attacks are the common causes in 
unimodal recognition systems. Spoofing attacks are 
commonly noticed when biometrics like signature, 
voice, face and finger prints are considered [2] in 
the recognition system. 

Motivation: 

Multimodal biometric recognition systems is used 
to overcome the drawbacks of the unimodal 
recognition systems and have proved to be 
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successful [6] [7]. The LDA-PAFF is a multimodal 
recognition system. Limited work has been carried 
out by researchers considering a comprehensive set 
of biometric traits of personnel. The research work 
carried out by other researchers considers either the 
magnitude features [8] [9] [10] [1 1] or the phase 
features[12] [13] [14] [15]. 

Contribution: 

Considering the research findings, a LDA based 
Personnel Authentication using Finger vein and 
Face Images (LDA-PAFF) are introduced in this 
paper. The state of art work presented by Shekar et 
al., [7] considers the Iris, Finger print and Face 
biometrics for recognition. In LDA-PAFF the 
finger vein and face biometric is considered for 
recognition. In LDA-PAFF, the personnel are 
identified on the basis of the Gabor kernel features 
extracted. To enable efficient feature extraction and 
recognition the biometric data obtained from the 
sensors are pre processed to obtain the region of 
interest(ROI) for the considered biometric traits. On 
obtaining the fip /, data feature extraction is 
performed using Gabor kernels. The novelty of 
LDA-PAFF is that both the phase features and 
magnitude feature are considered. 

Organization: 

The manuscript is organized as follows. Section 2 
discusses the related work. The background is 
discussed in Section 3. The LDA-PAFF proposed 
is presented in Section 4. The penultimate section 
of the manuscript discusses the experimental work 
and results obtained. The conclusions are drawn in 
the last section. 

2. Related Work 

Many number of researches have been done till 
these days for human traits based biometric 
identification system where some are emphasized 
for multi model consideration while taking into 
account of performance and classification accuracy 
as prime objectives. Some of them are as follows: 

Monwar M et al., [13] develop a multimodal 
biometric system using Fisher Extraction Scheme 
on the basis of PCA and Fisher's linear discriminant 
(FLD) approach which do employs face, ear and 
signature for identification. They employed rank- 



level fusion process and used Borda count paradigm 
(combination of ranks for individual model) and 
logistic regression technique. This system exhibited 
that the fusion of varied models could lead to 
performance enhancement. 

Dinakardas C et al., [14] in their system 
developed a multimodal face recognition system 
using fusion of results from PCA, fisherface as well 
as minutia extraction with LBP feature extraction 
for varied biometric traction. The authors 
emphasized system optimization for accuracy of 
recognition. 

Jihyeon Jang et al., [15] developed a multiple 
biometric system taking into consideration of non- 
linear classifiers and derived varied score vectors 
which was classified using SVM, Kernel Fisher 
Discriminant (KFD) and further by Bayesian 
Classifier. They exhibited their system functional 
efficiently with multi-model architecture. 

Jian Yang et al., [16] emphasized their research for 
projection using unsupervised discriminant 
projection (UDP) scheme for reducing 

dimensionality of high-dimensional data in certain 
defined small sample size cases. The uniqueness of 
this system was that it (UDP) characterizes the local 
scatter factor as well as the available nonlocal 
scatter, requiring estimation of certain data 
projection which could optimize the nonlocal 
scatter simultaneously. They employed this system 
with face and palm based biometric identification. 
The authors advocated their system to be used for 
real time biometrics utilities. 

3. Background Work 

This is the fact that a number of researcher have 
made effort to enhance the system performance and 
among them Jiwen Lu et al., [22] have employed 
cost sensitive analysis paradigm for face 
recognition. In general the traditional subspace 
oriented face identification approaches need lower 
dimensional feature subspaces for accomplishing 
higher accuracy of classification[24][25][26][27]. 
In fact such kind of assumptions could not possess 
effectiveness in varied circumstances. In [22], the 
developed system employs a cost matrix factor 
which specifies varied cost factors in relation with 
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different types of misclassification, and for this the 
author have taken discriminative subspace analysis 
approach into consideration, of which was further 
devised into the cost-sensitive linear discriminant 
analysis (CSLDA). Later a cost- sensitive marginal 
fisher analysis (CSMFA) approach was employed 
for accomplishing the value of minimum 
identification loss by exhibiting identification with 
learned low-dimensional subspaces. 

In order to enhance the system by exploiting 
complementary details from multiple extracted 
features they proposed a multi-view cost sensitive 
subspace analysis scheme that needs a common 
feature subspace for fusing multiple features. In fact 
this work was an enhanced form of [23] which has 
already employed certain cost-sensitive PCA and 
LPP (CSLPP) approach for face identification. On 
the other hand generic PCA and LPP approaches 
are unsupervised and author enhanced it with 
supervised, which resulted into better results. In 
their work they have enriched the system with two 
discriminative subspace analysis approach called 
(LDA and marginal Fisher analysis (MFA). Some 
other works such as [14] [15] [21] have also 
emphasized their system for multimodal biometric 
application and have tried to function on reduced 
dimensionality with linear subspaces. 

This is the matter of fact that the existing 
approaches have performed better, but taking into 
consideration of varied critical human traits and 
associated real time circumstances such as lighting, 
contrast and orientation, major systems gets limited 
to exhibit better. On the contrary the 
implementation of traditional LDA doesn’t ensure 
optimal results. Therefore these requirements 
become a motivation for this present research and 
we have proposed a highly robust and efficient 
system employing phase congruency with Gabor 
extraction, fisher matrix enriched with LDA 
paradigm and the system has been further optimized 
with K-nearest neighbor classification system 
which makes the system optimal in terms of 
accuracy, efficiency and overall performance. 

4. Proposed Model 

The proposed architecture for LDA based personal 
authentication using finger vein and face images is 
LDA-PAFF as shown in Fig.l. 



Let us consider a multimodal biometric dataset of 
P personnel. There exists a classification problem 
of P personnel to be identified based on 
their E biometric feature set. The biometric feature 
set consisting of Finger Vein and Face can be 
defined as 



IE- 



Where V* is the phase features for the finger vein 
and F* represents the feature set of the face 
biometric. 

The LDA-PAFF proposed in this paper considers 
primarily Two biometric features of P personnel 
namely the finger vein ta5 and frontal face ® . Pre- 
processing is adopted on all the raw biometric 
images to obtain the regions of interest {RWj . 
The identification procedures adopted is 

discussed in the future section of the paper. 
The SVi identified for finger vein and face are 
represented as F.f in the remaining manuscript. 
The use of Gabor kernels is considered for phase 
congruency feature extraction i.e. 




Multimodal Biometric Dataset 




Preprocessing and ROI identification 



GaborPhasc Fea:ure Extraction 
Gabcr F&atire Fusion 

Dimension fietluttbr using .QA and Linear Projection 
ItearestNeighbor Classifier 



Fig. 1. System Architecture 
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4.1. Finger Vein Biometric F-ROI Identification 



The finger vein biometric image set v can be 
represented as 



V ■ f Yg ,■ ¥ig III II 



( 2 ) 



For precise &Q1 extraction of finger veins, in LDA- 
PAFF grey scaling, edge detection, JEtfJ area 
normalization and greyscale normalization 
techniques are adopted. The grey scaling 
operation[16] for an image 



v n e -y can be defined as 


F.4 FS) 


(3) 


A=|g,5S7Q x fffofc#)], B=|g.ii4og xsfetf,/# 





the face 5W image the vector <4 - S&j, a it ,« , needs 
to be computed, where o,- represents the level to 
which the particular pixel s tft is assigned to. The 
variable o £ accepts values from the level 
set £ ■ ixv.Vf v T j. The level set X comprises only 
two possible levels, either Face or Non-face. The 
subsequent probability for a for the given b is 
represented as and is computed using 










( 6 ) 



Using the probability 1 £b[$ the energy facto 
{SlO for a face or non-face level is 



EF{a) = (- taglia F tf) = Mo, &J F ifrtoJ F If} 



(7) 



Where and sfofc/j) represent 

the red, green and blue channel values on the pixel 
at the location . The dimensions of the 
image v Ti are represented as q x h .We have 
employed the Sobel operator for edge detection on 
the image with a masking scale 

of 3 ft 3. The mask utilized is 





-1 0 ]■ 








-Z 0 l 
l-l g lJ 




(4) 



Where and 

K represents a constant. 

The greyscale face dataset constructed is 
defined as 



P ■ J(P| (PpJ 



4.3. Multimodal Biometric Feature Extraction 
Using Gabor Filters and Fusion Set Creation 



The Edge detected images vary in size. To 
normalize the size of the image to 12S ft 12*, bilinear 
interpolation is adopted [17]. 

4.2. Face Biometric F-ROI Identification 



In LDA-PAFF, the ROI of the face is identified by 
localization, image segmentation, face region 
classification and non-face region classifications 
techniques. The face image dataset (Fof number of 
personnel can be defined as 



r- 0r L ,%n 



(5) 



Consider an image ff represented as a 
vector b ■ h f \- where h- L is the color of i tk 

pixel and #is the total number of pixels. To obtain 



The use of Gabor kernels for feature extraction 
have proved to be robust and efficient in personnel 
biometrics identification systems [18]. In LDA- 
PAFF the use of Gabor kernels for feature 
extraction from the multimodal SQI image datasets 
of Finger Vein and Face is adopted. The magnitude 
features and the phase features have been 
considered to define the S9t images. The Gabor 
kernels are complex band limited filters that enable 
fine grained localization in the frequency and 
spatial domain [19]. For a confined frequency band 
the Gabor kernels enable robust feature extractions 
in terms of spatially local features, orientation 
features and multi resolutional features. The Gabor 
features extracted efficiently negate the varied 
environmental conditions changes occurring due to 
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illumination, intensity, position and orientations. 
The Gabor kernels relate to the simple cells of the 
mammalian visual cortex and are thus are relevant 
from the biological point of few as well [20]. 

Let us consider an RQl image represented 
as £j| where ! *•*' a ¥ ■ P. If the orientation is 

center frequency is then the Gabor kernel is 
represented by . The feature extraction 

process in LDA-PAFF is achieved by performing 
the filtering operation on utilizing the 

kernel function of size s and orientation □ 
represented as . The feature extraction 

function ffit a ¥ II T can be defined 

as 



a* = /""fa, fi) * i) 


(9) 


The features obtained are complex in nature 

and consist of the real and imaginary components 


defined as 




Iff - Iff] - « ?4«Co, Iff] 


(10) 




Iff - tf] ■ it ^ Iff] 


(11) 



It can be observed that the feature vectors obtained 
for the finger vein and face biometric possess same 
dimensions and a simple union method is adopted 
in the LDA-PAFF to create the feature fusion set. 
The feature fusion set E can be defined as 



Hi - u<g> 



( 12 ) 



4.4. Feature Sub Space Dimensional Reduction 

The fusion datasets E consists of a large number of 
h data points. The large dimensions of the 
set E induce huge computational and space 
requirements for personnel classification in the 
LDA-PAFF. The data available in the set E is 
considered to encompass # points in c clusters. 
Each cluster represents a personnel a P and is a 
subspace in the space B k . Each data point can be 
represented as 



flft 


(13) 


Where is the Gabor feature and &. m a ^ . The 

k m ) k 


class assignment variable is represented as 
The Gabor Feature matrix £ £ 
represented as 


can be 


S m 


(14) 



To reduce the dimensions of the subspace 
projection the use of Fisher Scores and Linear 
Discriminate Analysis is considered in the LDA- 
PAFF. The Fischer scores [21] enable dimensional 
reduction. In addition the Fischer Scores optimize 
the subspace projections by increasing the inter 
cluster distances and reducing the intra cluster 
distances. The Linear Discriminate Analysis assists 
in feature combinations and enables accurate 
projections of the subspaces [22]. 

4.5. Classification Using K-Nearest Neighbor 

Let the set T represent the 

training set. The training vector 

£ SP where is the Gabor feature 
set representing the j? l-F class. The training 
set T is considered as the dataset of the registered 
:p personnel enrolled in the LDA-PAFF. Let 
represent the unknown or 
testing dataset and If < 37 . Similar to the training set 
the testing set vector can be represented as 
tq, ■ KftvPvBFv S y with the class variable p v is 
treated as an unknown. The Gabor feature set of the 
training or testing sets is represented as 

a *' 111 flraJ- 

To identify the unknown class in the test data U the 
use of Weighted K Nearest Neighbor Classifier is 
adopted in the LDA-PAFF . To classify the 
vectors i y 7 the Weighted K Nearest Neighbor 
ranks the Gabor features of the test vector amongst 
the Gabor features of the training vectors. Using the 
rank and the known !P classes of the train data the 
classifier predicts the unknown personnel class of 
the test vector using the personnel classes of the 
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similar neighbors. The similarity amongst the test Environmental illumination variations considered 
and train vectors is computed using data is also provided in the face dataset. 




Where w is the weight vector, r represents the total 
number of Gabor Kernel features of the biometric 
feature under consideration. 



The dataset available is split into training and 
testing data i.e. 7-11 . Equal number of train and test 
images are considered in the finger vein and face 
data. The dataset used and the construction of the 
test and train data is summarized in Table 1. 

Table 1: SDUMLA-HMT DATA SET 
PARAMETERS CONSIDERD 



A weighing or scoring operation is performed to 
identify the nearest neighbors of the test vector 
using the similarity matrix. 




Where is the nearest neighbors of the 
unknown test vector u p , 

classification index of the 
train vector t p with respect to the personnel class 3> p . 

5. EXPERIMENTAL STUDY 

To evaluate the performance of the LDA-PAFF 
use of the SDUMLA-HMT multimodal biometric 
dataset [23] is considered. The SDUMLA-HMT 
data set consists of five biometric traits namely 
finger vein, iris, face, fingerprint and gait. The 
SDUMLA-HMT encompasses biometric traits of 
106 personnel. A total of 45 female and 61 males 
aged between 17 and 31 are the personnel 
considered in the dataset. To evaluate the 
performance the use of Finger Vein and Face 
biometric data from the SDUMLA-HMT dataset is 
considered. The finger vein data provides data 
about the ring finger, index finger and middle finger 
collected over six sessions. 

A total of 84 face images per personnel are 
provided. Personnel accessories, phase and 
expression variations are considered in the face data. 



Biometric 

Feature 


No of Personnel 


Biometric Data 
Per Personnel 


Total Number of 
Images 


Training Data 
Size 


Testing Data 
Size 


Finger 

Vein 


106 


36 


3816 


1908 


1908 


Face 


106 


84 


8904 


4452 


4452 



The fiW images extracted from the raw train and 
test data are converted to greyscale images and 
down sampled to 123 ft 1ZS . The Gabor kernel 
considered in the LDA-PAFF is constructed using 
3 orientations i.e. tj ■ and 5 scales 

i.e. £ ■ £lki«i^3 resulting in 40 complex filters. The 
feature fused data is obtained is dimensionally 
reduced using the function. A 

dimensional reduction of about 77% is achieved. 
The performance evaluation is carried out using 
Mat lab 2013b on an Intel i5 system. 

The Cumulative Match Characteristic( CMC ) are 
computed for the Proposed System and Existing 
System. The results obtained considering 106 p 
personnel or ranks is shown in Fig 2 . From Fig 2 it 
is observed that the Proposed System exhibits a 
better recognition rate. The CMC analysis is also 
used to compute the multimodal biometric system 
recognition rate. The CMC analysis results are 
summarized in Table 2. The cumulative system 
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recognition rates for the Proposed system and 
Existing systems is shown in Fig 3. 




Fig. 2. Performance comparison considering j£iW analysis 
for Bimodal biometric verification 



paper. From the figure the performance 
improvement considering the proposed Bimodal is 
evident considering the vein and face dataset. The 
average recognition rate of Proposed system is 
around 92.89 and of the Existing is around 90.83 as 
shown in Fig 5. The ROC computation results are 
summarized in Table 3 of this paper. 

The classification error statistics computed is 
graphically displayed in Fig 6. The receiver 
operating characteristics analysis can also be used 
to compute the verification rate normalized between 
0-1 and the False Acceptance Rate. The false 
acceptance rate results are shown Fig 7. The 
performance improvement considering the 
Proposed System is clear from the results shown. 

Table 3. ROC computation results of Proposed 

System and Existing system 




Fig.3. CMC Analysis - Cumulative System 
Recognition Rate Comparison 

Table 2. CMC Analysis Results 





NO OF 


SYSTEM 


ALGORITHM 


PERSONNEL/RANK 


RECOGNITION 




S 


RATE 


Existing 

System 


106 


89.31% 


Proposed 

System 


106 


98.43% 



To evaluate the performance of the Proposed 
system and Existing System receiver operating 
characteristics (ROC) were computed. The 
ROC curves obtained is shown in Fig 4 of this 



ALGORITHM 


ROC 

CLASSIFICATION 
ERROR (%) 


ROC 

AVERAGE 

RECOGNITION 

RATE(%) 


ROC FALSE 
ACCEPTANCE 
RATE 
(FAR %) 


Existing 

System 


5.13 


90.83 


5.12 


Proposed 

System 


0.74 


92.89 


0.74 




Fig .4. ROC Analysis Comparison Considering SDUMLA- 
HMT Dataset 
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ROC AVERAGE RECOGNITION 
RATE(%] 

93.5 
93 

3? 92.5 
« 92 
5 91.5 
g 91 
• 90.5 

Q. 

90 

89.5 

EXISTING SYSTEM PROPOSED SYSTEM 

Algorithm 



Fig.5.ROC Analysis - Average Recognition Rate 
Comparison 



ROC CLASSIFICATION ERR0R(%) 




EXISTING SYSTEM PROPOSED SYSTEM 

Algorithm 



Fig.6.ROC Analysis - Classification Error Comparison 



FALSE ACCEPTANCE RATE (FAR)(%) 




EXISTING SYSTEM PROPOSED SYSTEM 

Algorithm 



Fig.7.ROC Analysis - False Acceptance Comparison 

The EPC analysis is used to compute the unbiased 
estimates of verification performance of the 
Existing System and Proposed System . The results 
obtained is shown in Fig. 8. The average Half total 
error rate considering the proposed system is 
0.004% compared to the half total error rate of 
0.051% observed considering the existing system. 
Based on the analysis carried out and the results 
presented it can be concluded that the proposed is 
robust and exhibits better user verification when 
compared to the existing mechanism. 



EPC Curve for Bimodal Authentication 
Considering Finger Vein and Face 




rHNmcTi/TrHNmchin<HNrocr>inr-iN 

ririrsioomsj-stifiix>Lor''r>ooaTcx> 



Alpha Value 



Fig. 8. Performance comparison considering EPC analysis 
for Bimodal Authentication 
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6. Conclusions 

The use of biometrics for personnel identification is 
very common. The unimodal biometric recognition 
systems suffer from a number of drawbacks 
discussed in this paper. To over-come these 
drawbacks the LDA-PAFF is introduced in this 
paper. The LDA-PAFF considers the finger vein 
and face biometric traits for enrolment and 
recognitions of personnel into the system. The raw 
data obtained from the biometric sensors are 
preprocessed to obtain the relevant ROI'S. The use 
of Gabor Kernels is considered for feature 
extraction. The magnitude and phase features are 
considered. Limited work is carried out considering 
both these features for extraction in multimodal 
biometric systems. Fischer score and linear 
discriminate Analysis is considered for dimensional 
reduction. Feature dimension reduction of 77% is 
achieved using this methodology. For personnel 
verification the weighted k-nearest neighbor 
classifier is used. SDUMLA-HMT multimodal 
biometric dataset is used for performance 
evaluation. 

The future of the work presented in this paper is 
consideration of additional biometric traits and 
additional biometric trait combinations for building 
robust and reliable recognition systems for 
personnel identification. 
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Abstract - This paper seeks to establish the socio- 
political impacts that the globalization of the 
information technology industry has in different 
societies. The review focuses on defining the two 
concepts of IT and Globalization and later 
outlines the relationship that exists between them. 
IT is an important contributor to the process of 
globalizing different communities and economies. 
While most people might view the relationship as 
unidirectional, it is untrue since globalization also 
impacts how the IT industry develops through 
various markets. The paper focuses on the two 
objectives of IT that are automation of processes 
for better efficiency and integrating people by 
promoting communication between them. These 
goals match the definition of globalization that 
describes their connection. Globalizing IT has 
different impacts on the politics and policy 
implementations of various locations through the 
flow of information that influences the people in 
governments. Other effects include the formation 
of trade blocs that are influential on the 
performance of the IT industry. 

INTRODUCTION 

Globalization refers to a process that allows 
the interaction and integration of different societal 
entities especially regarding business investments and 
international trade [16]. Researchers argue that this is 
not a new trend but rather a continuation of 
traditional conventions and developments [14]. The 
trend has brought with it a lot of enhancements in the 
trading environments with influences on both private 
and public sectors of the economies around the globe. 
There are frequent arguments that information 
technology and globalization turn on the same page, 
others state that they are separate concepts that just 
influence one another from different dimensions [24]. 
There has been a widespread study on the two 
concepts and how they impact businesses through 
their relationship in every market in the world. 

This research will delve into the socio- 
political influences that the notion of globalization 



and the industry of information technology have on 
the business environment. The primary concern of 
the investigation is to determine how the 
globalization of the information technology sector 
impacts the societies that are in participation. The 
two areas are already experiencing much attention as 
the main influences on the economies of the world by 
enabling business transaction in many ways [4]. The 
paper will then focus on the different factors that 
make the globalization of the technological and 
communication platforms a significant contributor to 
the society. The information will come from previous 
research findings on how the two concepts relate and 
then determine the effects they create with a 
particular focus on empowering the economy. 

LITERATURE REVIEW 
What is Globalization ? 

The basic definition of globalization is that 
it is a process that enables the interdependence of 
economies and territories by integrating and 
increasing the integration of the people [1]. The idea 
of globalization has gained popularity in recent 
periods with different factors contributing to the 
acceleration of its growth. The motivation behind the 
concept was to expand the markets that each business 
had to explore by engaging more people in the 
activities [22]. This objective required that most of 
the ventures move to indulge in newer areas that 
would promise larger audiences for market products 
and also as a source of new items [26]. The territories 
aim to acquire equitable distribution of resources by 
eliminating the factors that hinder international trade, 
such as policies, to achieve the intentions of 
globalization in the market [27]. 

There are different factors that impact the 
trends of globalization, and one of them is the 
liberalization of trading by implementing flexible 
policies across the platform [7]. Most countries are 
developing towards a situation that allows the equal 
participation of both foreign and local products in 
their markets to increase competition. Most of them 
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even fail to protect their industries from the 
competitive pressure to achieve a common ground 
that allows them to take part in international activities 
[7]. Some argue that these development gives their 
local producers an opportunity in the foreign areas as 
well which is a justification for their move to allow 
external traders in their markets. The second factor 
that enables globalization is the technological 
advancement that the business industry is witnessing. 
The developments in information technology reduce 
the costs of communication and transport and make 
the processes quicker [19]. The second aspect has had 
the significant influence on the whole process of 
integrating the people around the world and their 
activities in different ways. There are other areas that 
have received the impact of technology such as better 
production techniques that are a competitive 
advantage among the communities. 

Defining Information Technology 

IT refers to the use of computing and 
telecommunications systems to transmit, retrieve, and 
store data and information [2]. Information t plays a 
very significant role in the lives of humans especially 
in providing the content they require at their 
fingertips using electronic structures to outline its 
objectives [5]. There have been rapid developments 
in this field in the recent years with systems 
improving their capacities to assist human thinking in 
different ways. The trend looks to put information 
and communication technology in a spectrum that is 
integral to various activities that people do around the 
world. There are improvements in the capabilities of 
the devices that enhance the speed and efficiency of 
the different process in the world, for example, 
through robotics. 

Information technology is an influential 
factor in developing intelligent systems and 
infrastructure that are essential to worldwide 
economies [17]. An example is the transport network 
that employs IT in different aspects to ensure 
efficiency and the creation of sufficient 
infrastructure. They allow the movement of 
commodities and services providers through various 
regions with ease [9]. They include the enhancement 
of vehicles, for example, that are easier to handle 
through implement IT devices into their operations 
and using other technologies to monitor the road 
transit processes. The progress in this area also 
promises to improve the way communities do 
business by empowering inclusive models through 
the networking strategies that it presents to them [28]. 
Therefore, IT is an important contributor to economic 
growth in many ways that include doing business in 
the field itself. The aspects of networking that result 
from the use of IT in businesses is among the factors 



that contribute to the globalization of the different 
territories and their socio-political attributes. 

Objectives of the Information Technology Industry 

The primary goal that the IT industry seeks 
to achieve is the automation of processes to make 
them quicker, more accurate and efficient [5]. The 
objective covers all areas of other industries 
including the manufacturing activities as well as 
communication and transport. These empowerments 
in businesses are significant as they impact the 
aspects of cost efficiency and process improvements. 
They can promise the better performance of different 
aspects of society by reducing human effort and in 
turn enabling people to focus on more activities [8]. 
There is wide criticism that most industry players in 
the field of IT only concentrate on the primary 
objective of developing and installing the systems 
[5]. They fail to follow up to other stages that are 
beyond the installation such as improving on 
particular aspects. They prefer to develop newer 
strategies instead of improving those that are 
currently in use. 

The second most prominent objective of the 
development in Information Technology is the 
integration of people and communities [21]. The 
globalization of IT strategies and methods ensures 
that societies can connect with one another around 
the world through the processes of sharing 
information. Businesses are the biggest beneficiaries 
of this goal as they can use it for decision-making 
about different regions around the globe. For 
example, the can tell which areas are the best 
producers and which ones can save them production 
costs for their operations. 

Components of Information Technology 

There are five elements that enable the 
efficiency of IT in various applications. Their 
combinations develop the systems that define 
information technology and its utilization [5]. They 
are Hardware, Software, Procedures, Data, and 
People. These are the internal components of the 
system that enable its functionality regardless of their 
environment. Therefore, each of them has to be 
present to define a complete system that operates 
efficiently implying that they are interdependent in 
this context. Another important component that falls 
in this category is the network that enables a 
connection between independent systems between the 
regions. It is the core descriptor of the 
communication strategies that allow the people to 
share information through different media bases. 

Hardware refers to the machinery and 
physical attributes of the system that are tangible and 
the users apply them to implement the other 
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components [5]. Software refers to the intangible 
parts of a computer that are the programs and 
applications that initiate and complete the procedures 
of the machines. Data is the essential aspect that the 
two components above use to deliver a 

communication between two systems. It develops in 
the two aspects in the form of databases that store 
particular information that is relevant to an activity of 
interest for the communities [10]. There are different 
sources of data that include the systems themselves 
through the information they store and also the 
observations and collection exercises that people 
undertake and input their findings into them. 
Procedures define the general processes of operating 
the technology and the other components by 
implementing the three parts of hardware, software 
and data. People in this context refer to the users of 
the systems that enable their operation and define 
their utility to others who are interested in the 
particular procedures. 

External aspects of Information Technology 
include those that relate to the people and their 
structures of governance and Association [15]. The 
most significant factor, in this case, is the governance 
methodologies and structures that different 
authorities implement regarding the field of IT. They 
are an influential contribution to its efficient use and 
attainment of its various objectives of the operation. 
They can impact the use of technology according to 
the preferences and perceptions of a particular 
community and their authorities. These factors are 
also important determining the opportunities that the 
industry has in the different locations and how users 
can perform its strategic alignments to the objectives. 

The advancement of information technology 
is dependent on consistent innovation in the industry 
that aims to improve the systems to support various 
activities of a particular operation [18]. After the 
development of individual devices, the developers 
seek to make improvements on them or bring new 
ideas to the landscape. They have to clear every 
challenge they face in their industry by implementing 
new machines and procedures that are more efficient 
than the current technology. With the amount of 
competition growing in the industry, every producer 
struggles to come up with innovative solutions to 
gain an advantage in the markets. It is beneficial to 
consumers as it promises better outcomes each time. 

Relationship between Globalization and the 
Information Technology Industry 

It is important to establish the connection 
that exists between the information technology 
industry and the aspect of globalization. Most 
arguments place IT as an enabler of globalization in 
different ways [24]. There is a focus on how much IT 



has impacted the participation of developing 
countries in global economic development [23]. The 
IT industry’s development in such areas has allowed 
them to have a say on the same platforms as other 
nations in the implementation of international trade. 
The primary objective of globalization is to create an 
integration of people, and they share this with 
information technology. Therefore, it is easier to 
conclude that IT is an enabler of globalization. As 
much as this is true, the relationship does not go one 
way. There are many influences of globalization that 
impact the IT industry as well. Some of them include 
governance on the production and use of technology 
in various territories that can control the consumption 
of IT. 

Socio-Political Influences of Globalizing the 
Information Technology Industry 

Migration counts among the most prominent 
outcomes of intensive globalization as people go to 
explore new markets and opportunities in foreign 
areas [11]. The impact of globalizing the industry of 
IT also promises to encourage the same trend as the 
producers will venture into new areas to promote 
their businesses. There is the creation of jobs through 
the industry that can also motivate the migration of 
people from different regions to take this opportunity. 
The ease of movement and communication around 
the globe by most important is another area that 
information technology has enabled [12]. People can 
now access information about a particular place and 
its opportunities using such media as the Internet and 
organization their immigrations as well through it. 
The trend has led to the integration of different 
locations by bringing in new actors either for 
business or other purposes as a result of globalizing 
the industry of information technology. There is also 
a pattern that shows the willingness of every country 
to encourage research and development on the 
aspects of technology and it implies a future 
devolution of developers to every state. These 
circumstances may increase traffic flow between the 
territories to ensure the exploration of all ventures 
and opportunities. There are concerns that each local 
community will feel the threat from external 
participants on the flow of labor, capital, and 
information. Most of them encourage the regulation 
of the migrations to protect the indigenous people 
from these threats [20]. 

Globalization has led to the creation of trade 
blocs which reduces the significance of nation states 
[6]. The collective strategies to address regional 
issues eliminate the absolute sovereignty that most 
countries used to practice. It enables them to achieve 
a universal objective by pooling their resources and 
strategies that are an impact to the traditional political 
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system that relied on the sovereign rule. Most of the 
economies develop legislations that touch on 
different aspects of their politics to ensure they 
comply with the agreements that they make in the 
unions. The impact of international trade on such 
economies can get the influence of the IT industry as 
it seeks to expand to newer regions. The businesses 
will try to indulge in areas that are politically stable 
with democracy as the primary factor of concern for 
them [6]. This requirement from the enterprises of the 
IT industry can influence the way most states practice 
their governance to allow them to be a participative 
market. 

There is another influence that results from 
the abilities of the internet and social media to 
connect people around the world on their politics. 
They bring the possibility of people across the oceans 
to make commentaries about the political situations 
of other countries that have an influence on how 
governments implement their policies (Weare). It has 
also enabled the empowerment of nongovernmental 
organizations and the civil society in participating in 
different forums through social and mass media [6]. 
The participation of the country’s citizens has also 
gained significant improvement because they now 
have a platform that enables others to hear their voice 
in governance. The globalization of the Internet to 
more regions means that each action has a wider 
audience and consequentially more influence even 
from foreign places. Most civil society players use 
the internet as an avenue to hold campaigns that are 
in support or against particular political activities in 
their states. Another political aspect that has received 
influence from the development in IT is the 
empowerment of women around the globe [25]. 

There is an increasing convergence between 
mass media and information technology resulting 
from the globalization of the latter industry [13]. The 
industry of information technology sector plays a 
significant role in the spread of information and 
producers of the media have sought to take advantage 
of the capabilities. There are two impacts that will 
result from this development, and the first is the 
liberalization of international trade [21]. The trend 
will promote the focus on improving multinational 
corporations as there is a focus on how each country 
reacts to them by the media. Among the enterprises 
that will benefit from this are the players in the IT 
industry as they now experience flexible market 
regimes. They also have a direct market for their 
products as the mass media producers will use them 
to outline their objectives. The second effect of the 
converging trend between IT and media regards 
multilateral cooperation between different societies 
through integration [21]. The availability of 
information through these two components of 



communication will encourage the association 
between communities as they now have clarity on the 
similarities and differences between their cultures. It 
facilitates the efficient implementation of the policies 
and regulations of international trade that are a 
significant influence on the business structures. They 
are also an actor that allows the confidence to explore 
foreign markets as there is sufficient knowledge 
about the environments and the nature of trading. 

CONCLUSION 

The paper reveals the significance of 
globalization and how it impacts various industries 
and among them information technology. The 
relationship between the two concepts also shows 
that they are interdependent with each having a way 
of limiting or empowering the other. IT is the factor 
that has increased the processes of globalizing 
different communities and their economies. It makes 
IT an important contributor in ways such as the 
creation of employment and enabling quick 
communications that facilitate business efficiency. 
The globalization of the IT industry has different 
impacts on the integration of communities as well as 
an influence on the structures of governance. It also 
affects migration patterns depending on the 
information that people get about particular regions 
and their business environments. It affects the way 
that civil societies and nongovernmental 
organizations perform their activities as well by 
enabling the spread of knowledge and information 
about a particular territory. 
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access control rules, Middleware-oriented security patterns: identification of patterns for sound, reusable 
security, Security in aspect-based middleware: mechanisms for isolating and enforcing security aspects, 
Security in agent-based platforms: protection for mobile code and platforms, Smart Devices: Biometrics, 
National ID cards, Embedded Systems Security and TPMs, RFID Systems Security, Smart Card Security, 
Pervasive Systems: Digital Rights Management (DRM) in pervasive environments, Intrusion Detection and 
Information Filtering, Localization Systems Security (Tracking of People and Goods), Mobile Commerce 
Security, Privacy Enhancing Technologies, Security Protocols (for Identification and Authentication, 
Confidentiality and Privacy, and Integrity), Ubiquitous Networks: Ad Hoc Networks Security, Delay- 
Tolerant Network Security, Domestic Network Security, Peer-to-Peer Networks Security, Security Issues 
in Mobile and Ubiquitous Networks, Security of GSM/GPRS/UMTS Systems, Sensor Networks Security, 
Vehicular Network Security, Wireless Communication Security: Bluetooth, NFC, WiFi, WiMAX, 
WiMedia, others 



This Track will emphasize the design, implementation, management and applications of computer 
communications, networks and services. Topics of mostly theoretical nature are also welcome, provided 
there is clear practical potential in applying the results of such work. 

Track B: Computer Science 

Broadband wireless technologies: LTE, WiMAX, WiRAN, HSDPA, HSUPA, Resource allocation and 
interference management, Quality of service and scheduling methods, Capacity planning and dimensioning, 
Cross-layer design and Physical layer based issue, Interworking architecture and interoperability, Relay 
assisted and cooperative communications, Location and provisioning and mobility management, Call 
admission and flow/congestion control, Performance optimization, Channel capacity modeling and analysis, 
Middleware Issues: Event-based, publish/subscribe, and message-oriented middleware, Reconfigurable, 
adaptable, and reflective middleware approaches, Middleware solutions for reliability, fault tolerance, and 
quality-of-service, Scalability of middleware, Context-aware middleware, Autonomic and self-managing 
middleware, Evaluation techniques for middleware solutions, Formal methods and tools for designing, 
verifying, and evaluating, middleware, Software engineering techniques for middleware, Service oriented 
middleware, Agent-based middleware, Security middleware, Network Applications: Network-based 
automation, Cloud applications, Ubiquitous and pervasive applications, Collaborative applications, RFID 
and sensor network applications, Mobile applications, Smart home applications, Infrastructure monitoring 
and control applications, Remote health monitoring, GPS and location-based applications, Networked 
vehicles applications, Alert applications, Embeded Computer System, Advanced Control Systems, and 
Intelligent Control : Advanced control and measurement, computer and microprocessor-based control, 
signal processing, estimation and identification techniques, application specific IC’s, nonlinear and 
adaptive control, optimal and robot control, intelligent control, evolutionary computing, and intelligent 
systems, instrumentation subject to critical conditions, automotive, marine and aero-space control and all 
other control applications, Intelligent Control System, WiringAVireless Sensor, Signal Control System. 
Sensors, Actuators and Systems Integration : Intelligent sensors and actuators, multisensor fusion, sensor 
array and multi-channel processing, micro/nano technology, microsensors and microactuators, 
instrumentation electronics, MEMS and system integration, wireless sensor, Network Sensor, Hybrid 




Sensor, Distributed Sensor Networks. Signal and Image Processing : Digital signal processing theory, 
methods, DSP implementation, speech processing, image and multidimensional signal processing, Image 
analysis and processing, Image and Multimedia applications, Real-time multimedia signal processing, 
Computer vision, Emerging signal processing areas, Remote Sensing, Signal processing in education. 
Industrial Informatics: Industrial applications of neural networks, fuzzy algorithms, Neuro-Fuzzy 
application, bioinformatics, real-time computer control, real-time information systems, human-machine 
interfaces, C AD/C AM/C AT/CIM, virtual reality, industrial communications, flexible manufacturing 
systems, industrial automated process, Data Storage Management, Harddisk control, Supply Chain 
Management, Logistics applications, Power plant automation, Drives automation. Information Technology, 
Management of Information System : Management information systems, Information Management, 
Nursing information management, Information System, Information Technology and their application, Data 
retrieval, Data Base Management, Decision analysis methods, Information processing, Operations research, 
E-Business, E-Commerce, E-Government, Computer Business, Security and risk management, Medical 
imaging, Biotechnology, Bio-Medicine, Computer-based information systems in health care, Changing 
Access to Patient Information, Healthcare Management Information Technology. 
Communication/Computer Network, Transportation Application : On-board diagnostics, Active safety 
systems, Communication systems, Wireless technology, Communication application, Navigation and 
Guidance, Vision-based applications, Speech interface, Sensor fusion, Networking theory and technologies, 
Transportation information, Autonomous vehicle, Vehicle application of affective computing, Advance 
Computing technology and their application : Broadband and intelligent networks, Data Mining, Data 
fusion, Computational intelligence, Information and data security, Information indexing and retrieval, 
Information processing, Information systems and applications, Internet applications and performances, 
Knowledge based systems, Knowledge management, Software Engineering, Decision making, Mobile 
networks and services, Network management and services, Neural Network, Fuzzy logics, Neuro-Fuzzy, 
Expert approaches, Innovation Technology and Management : Innovation and product development, 
Emerging advances in business and its applications, Creativity in Internet management and retailing, B2B 
and B2C management, Electronic transceiver device for Retail Marketing Industries, Facilities planning 
and management, Innovative pervasive computing applications, Programming paradigms for pervasive 
systems, Software evolution and maintenance in pervasive systems, Middleware services and agent 
technologies, Adaptive, autonomic and context-aware computing, Mobile/Wireless computing systems and 
services in pervasive computing, Energy-efficient and green pervasive computing, Communication 
architectures for pervasive computing, Ad hoc networks for pervasive communications, Pervasive 
opportunistic communications and applications, Enabling technologies for pervasive systems (e.g., wireless 
BAN, PAN), Positioning and tracking technologies, Sensors and RFID in pervasive systems, Multimodal 
sensing and context for pervasive applications, Pervasive sensing, perception and semantic interpretation, 
Smart devices and intelligent environments, Trust, security and privacy issues in pervasive systems, User 
interfaces and interaction models, Virtual immersive communications, Wearable computers, Standards and 
interfaces for pervasive computing environments, Social and economic models for pervasive systems, 
Active and Programmable Networks, Ad Hoc & Sensor Network, Congestion and/or Flow Control, Content 
Distribution, Grid Networking, High-speed Network Architectures, Internet Services and Applications, 
Optical Networks, Mobile and Wireless Networks, Network Modeling and Simulation, Multicast, 
Multimedia Communications, Network Control and Management, Network Protocols, Network 
Performance, Network Measurement, Peer to Peer and Overlay Networks, Quality of Service and Quality 
of Experience, Ubiquitous Networks, Crosscutting Themes - Internet Technologies, Infrastructure, 
Services and Applications; Open Source Tools, Open Models and Architectures; Security, Privacy and 
Trust; Navigation Systems, Location Based Services; Social Networks and Online Communities; ICT 
Convergence, Digital Economy and Digital Divide, Neural Networks, Pattern Recognition, Computer 
Vision, Advanced Computing Architectures and New Programming Models, Visualization and Virtual 
Reality as Applied to Computational Science, Computer Architecture and Embedded Systems, Technology 
in Education, Theoretical Computer Science, Computing Ethics, Computing Practices & Applications 
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